"Fossies" - the Fresh Open Source Software Archive 
Member "freeha-1.0/RUNNING" (23 Nov 2006, 4951 Bytes) of package /linux/privat/old/freeha-1.0.tar.gz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style:
standard) with prefixed line numbers.
Alternatively you can here
view or
download the uninterpreted source code file.
1 FreeHA Operations Manual
2
3 In normal operation, all nodes will be booted up and have the freehad demon
4 started. When all nodes are up, freehad will determine the node that has
5 the **alphabetically first** uname, and start services on that node.
6
7 If you have more than 2 nodes, and the 'first' node starts more slowly than
8 the others, it is possible that services will be auto-started on one of the
9 other faster-starting nodes.
10
11
12
13 SERVICES
14 -----------
15
16 To determine what services are run on the active node, look at
17 the 'starthasrv' script, in the main BINDIR for the demon.
18 [ usually, /opt/freeha/bin/starthasrv ]
19
20 Read the "INSTALL" file, under the "SETTING UP CLUSTERED SERVICES" section,
21 for more details on configuring services.
22
23
24
25 STARTING AND STOPPING
26 ----------------------
27
28
29 A regular kill of the process will cleanly stop services, and
30 shut down the demon. This is how services should be stopped on the primary,
31 node, if you wish to fail over to a secondary node.
32
33 The other nodes will view the stopped node to be in a 'STOPPING' state,
34 until it times out, after which services will be started on another node.
35 This is a 'feature', in that it gives you time to decide "oops, I screwed up"
36 and restart the demon on the node you just killed it on.
37
38 If you restart the demon after services have successfully been transitioned
39 to another node, they will safely remain on that other node without
40 interference.
41
42 o o o o o o o
43
44 To cleanly stop services on a node, but still leave the demon running, send
45 a HUP signal to the demon, with "kill -HUP {pid-of-demon}"
46
47 This will put the demon on the current node into 'STOPPING' state,
48 and the stophasrv script will be called.
49
50 After shutdown of services, the "cluster" will try to start services
51 up on the alphabetically first node that is still visible.
52 This makes it possibly a waste of your time to send a HUP signal to the
53 first node in the cluster! It is primarily useful for failing the
54 services back to the "primary" node (the "first" node).
55
56 If you wish to fail service from the primary node to the secondary node,
57 then 'fail' is exactly what is neccessary.
58 Use the "stophasrv" script to do a *clean* shutdown of services on node 1.
59 Since the demon itself did not initiate the shutdown, it will interpret
60 that as a "failure" of services. A secondary node will then
61 shortly take over services.
62
63 o o o o o o o o o o o o o o o o o o
64
65 To force starting services by an already running demon, send a USR1 signal.
66 eg: "kill -USR1 3456 "
67
68 THIS DOES NOT CHECK OTHER NODES TO SEE IF SERVICES ARE ALREADY RUNNING.
69 ALSO, THIS CLEARS THE ERROR FLAG.
70
71 To force starting services when starting up freehad, use the -m flag.
72 As with the -USR1 signal,
73 THIS DOES NOT CHECK OTHER NODES TO SEE IF SERVICES ARE ALREADY RUNNING.
74
75
76 If you just wish to clear the error flag, kill -USR2 {freehad-pid}
77
78 STATUS OF NODES
79 ----------------------
80
81 Current status of all nodes can be seen in /var/run/freeha, or if that
82 does not exist, /var/freeha/state, or whereever you specify with the -l flag.
83
84 Cute trick to run in a window:
85
86 while true ; do clear ; cat /var/run/freehad ; sleep 1 ; done
87
88
89 CLEARING ERRORED STATE
90 -------------------------
91
92 Currently, there are exactly two ways to clear a node of the 'ERRORED' state:
93
94 1. **force-start** services on it, with the USR1 signal
95 2. restart the demon
96
97
98
99 LOGGING
100 --------------------
101
102 FreeHA will use syslog to record major changes of state, if you have
103 USE_SYSLOG defined in the Makefile.
104 By default, it logs as facility LOCAL1. Edit the source to change this if you
105 desire.
106
107 Note that the freehad demon does not currently detatch itself from a tty,
108 if you run it by hand. (So it is not technically a 'demon' yet :-)
109 Similarly, it uses plain old system() to call the
110 start script. So if neccessary, be sure to redirect anything in your
111 starthasrv script, as
112 prog </dev/null >/dev/null 2>&1 &
113 if you dont want associations between the demon, and your program being run.
114
115
116 AVOIDING 'SPLIT-BRAIN' SYNDROME
117 --------------------------------
118
119 With a two-node cluster, there is always a posibility of the 'split-brain'
120 problem: Having each node lose contact with the other, then assuming they
121 need to start up services.
122 If you have multiple private network connections between them, then this is a
123 fairly unlikely possiblity. But, if you are DETERMINED to avoid this
124 situation, then I suggest you allocate a third node, that will not actually
125 run services, but simply serves as a 'neutral arbitrator' of who gets to run
126 services in this split-brain situation.
127
128 BUGS ON STARTUP
129 --------------------
130 As I mention elsewhere, the 'demon' is not fully a demon. Which means
131 that sometimes, child processes get somehow associated with the demon's
132 listening socket. Which means if you get an error like
133
134 ERROR trying to bind listen socket: Address already in use
135
136 you may have to kill services on that machine by hand, before starting
137 the demon again.
138