"Fossies" - the Fresh Open Source Software Archive

Member "freeha-1.0/RUNNING" (23 Nov 2006, 4951 Bytes) of package /linux/privat/old/freeha-1.0.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1                     FreeHA Operations Manual
    2 
    3 In normal operation, all nodes will be booted up and have the freehad demon
    4 started. When all nodes are up, freehad will determine the node that has
    5 the **alphabetically first** uname, and start services on that node.
    6 
    7 If you have more than 2 nodes, and the 'first' node starts more slowly than
    8 the others, it is possible that services will be auto-started on one of the
    9 other faster-starting nodes.
   10 
   11 
   12 
   13 SERVICES
   14 -----------
   15 
   16 To determine what services are run on the active node, look at
   17 the 'starthasrv' script, in the main BINDIR for the demon.
   18 [ usually, /opt/freeha/bin/starthasrv ]
   19 
   20 Read the "INSTALL" file, under the "SETTING UP CLUSTERED SERVICES" section,
   21 for more details on configuring services.
   22 
   23 
   24 
   25 STARTING AND STOPPING
   26 ----------------------
   27 
   28 
   29 A regular kill of the process will cleanly stop services, and
   30 shut down the demon. This is how services should be stopped on the primary,
   31 node, if you wish to fail over to a secondary node.
   32 
   33 The other nodes will view the stopped node to be in a 'STOPPING' state,
   34 until it times out, after which services will be started on another node.
   35 This is a 'feature', in that it gives you time to decide "oops, I screwed up"
   36 and restart the demon on the node you just killed it on.
   37 
   38 If you restart the demon after services have successfully been transitioned
   39 to another node, they will safely remain on that other node without
   40 interference.
   41 
   42   o o o o o o o
   43 
   44 To cleanly stop services on a node, but still leave the demon running, send
   45 a HUP signal to the demon, with "kill -HUP {pid-of-demon}"
   46 
   47 This will put the demon on the current node into 'STOPPING' state,
   48 and the stophasrv script will be called.
   49 
   50 After shutdown of services, the "cluster" will try to start services
   51 up on the alphabetically first node that is still visible.
   52 This makes it possibly a waste of your time to send a HUP signal to the
   53 first node in the cluster! It is primarily useful for failing the
   54 services back to the "primary" node (the "first" node).
   55 
   56 If you wish to fail service from the primary node to the secondary node,
   57 then 'fail' is exactly what is neccessary. 
   58 Use the "stophasrv" script to do a *clean* shutdown of services on node 1.
   59 Since the demon itself did not initiate the shutdown, it will interpret
   60 that as a "failure" of services. A secondary node will then
   61 shortly take over services.
   62 
   63  o o o o o o o o o o o o o o o o o o 
   64 
   65 To force starting services by an already running demon, send a USR1 signal.
   66 eg:  "kill -USR1 3456 "
   67 
   68 THIS DOES NOT CHECK OTHER NODES TO SEE IF SERVICES ARE ALREADY RUNNING.
   69 ALSO, THIS CLEARS THE ERROR FLAG.
   70 
   71 To force starting services when starting up freehad, use the -m flag.
   72 As with the -USR1 signal,
   73 THIS DOES NOT CHECK OTHER NODES TO SEE IF SERVICES ARE ALREADY RUNNING.
   74 
   75 
   76 If you just wish to clear the error flag,   kill -USR2 {freehad-pid}
   77 
   78  STATUS OF NODES
   79 ----------------------
   80 
   81 Current status of all nodes can be seen in /var/run/freeha, or if that
   82 does not exist, /var/freeha/state, or whereever you specify with the -l flag.
   83 
   84 Cute trick to run in a window:
   85 
   86  while true ; do clear ; cat /var/run/freehad ; sleep 1 ; done
   87 
   88 
   89  CLEARING ERRORED STATE
   90 -------------------------
   91 
   92 Currently, there are exactly two ways to clear a node of the 'ERRORED' state:
   93 
   94   1. **force-start** services on it, with the USR1 signal
   95   2. restart the demon
   96 
   97 
   98 
   99  LOGGING
  100 --------------------
  101 
  102 FreeHA will use syslog to record major changes of state, if you have
  103 USE_SYSLOG defined in the Makefile.
  104 By default, it logs as facility LOCAL1. Edit the source to change this if you
  105 desire.
  106 
  107 Note that the freehad demon does not currently detatch itself from a tty,
  108 if you run it by hand. (So it is not technically a 'demon' yet :-)
  109 Similarly, it uses plain old system() to call the
  110 start script. So if neccessary, be sure to redirect anything in your
  111 starthasrv script, as
  112   prog </dev/null >/dev/null 2>&1  &
  113 if you dont want associations between the demon, and your program being run.
  114 
  115 
  116 AVOIDING 'SPLIT-BRAIN' SYNDROME
  117 --------------------------------
  118 
  119 With a two-node cluster, there is always a posibility of the 'split-brain'
  120 problem: Having each node lose contact with the other, then assuming they
  121 need to start up services.
  122 If you have multiple private network connections between them, then this is a
  123 fairly unlikely possiblity. But, if you are DETERMINED to avoid this
  124 situation, then I suggest you allocate a third node, that will not actually
  125 run services, but simply serves as a 'neutral arbitrator' of who gets to run
  126 services in this split-brain situation.
  127 
  128 BUGS ON STARTUP
  129 --------------------
  130 As I mention elsewhere, the 'demon' is not fully a demon. Which means
  131 that sometimes, child processes get somehow associated with the demon's
  132 listening socket. Which means if you get an error like
  133 
  134   ERROR trying to bind listen socket: Address already in use
  135 
  136 you may have to kill services on that machine by hand, before starting
  137 the demon again.
  138