"Fossies" - the Fresh Open Source Software Archive
Member "pacemaker-Pacemaker-2.1.2/doc/crm_fencing.txt" (24 Nov 2021, 16932 Bytes) of package /linux/misc/pacemaker-Pacemaker-2.1.2.tar.gz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard
) with prefixed line numbers.
Alternatively you can here view
the uninterpreted source code file.
See also the last Fossies "Diffs"
side-by-side code changes report for "crm_fencing.txt": 2.1.0_vs_2.1.1
1 Fencing and Stonith
3 Dejan_Muhamedagic <firstname.lastname@example.org>
6 Fencing is a very important concept in computer clusters for HA
7 (High Availability). Unfortunately, given that fencing does not
8 offer a visible service to users, it is often neglected.
10 Fencing may be defined as a method to bring an HA cluster to a
11 known state. But, what is a "cluster state" after all? To answer
12 that question we have to see what is in the cluster.
14 == Introduction to HA clusters
16 Any computer cluster may be loosely defined as a collection of
17 cooperating computers or nodes. Nodes talk to each other over
18 communication channels, which are typically standard network
19 connections, such as Ethernet.
21 The main purpose of an HA cluster is to manage user services.
22 Typical examples of user services are an Apache web server or,
23 say, a MySQL database. From the user's point of view, the
24 services do some specific and hopefully useful work when ordered
25 to do so. To the cluster, however, they are just things which may
26 be started or stopped. This distinction is important, because the
27 nature of the service is irrelevant to the cluster. In the
28 cluster lingo, the user services are known as resources.
30 Every resource has a state attached, for instance: "resource r1
31 is started on node1". In an HA cluster, such state implies that
32 "resource r1 is stopped on all nodes but node1", because an HA
33 cluster must make sure that every resource may be started on at
34 most one node.
36 A collection of resource states and node states is the cluster
39 Every node must report every change that happens to resources.
40 This may happen only for the running resources, because a node
41 should not start resources unless told so by somebody. That
42 somebody is the Cluster Resource Manager (CRM) in our case.
44 So far so good. But what if, for whatever reason, we cannot
45 establish with certainty a state of some node or resource? This
46 is where fencing comes in. With fencing, even when the cluster
47 doesn't know what is happening on some node, we can make sure
48 that that node doesn't run any or certain important resources.
50 If you wonder how this can happen, there may be many risks
51 involved with computing: reckless people, power outages, natural
52 disasters, rodents, thieves, software bugs, just to name a few.
53 We are sure that at least a few times your computer failed
56 == Fencing
58 There are two kinds of fencing: resource level and node level.
60 Using the resource level fencing the cluster can make sure that
61 a node cannot access one or more resources. One typical example
62 is a SAN, where a fencing operation changes rules on a SAN switch
63 to deny access from a node.
65 The resource level fencing may be achieved using normal resources
66 on which the resource we want to protect would depend. Such a
67 resource would simply refuse to start on this node and therefore
68 resources which depend on it will be unrunnable on the same node
69 as well.
71 The node level fencing makes sure that a node does not run any
72 resources at all. This is usually done in a very simple, yet
73 brutal way: the node is simply reset using a power switch. This
74 may ultimately be necessary because the node may not be
75 responsive at all.
77 The node level fencing is our primary subject below.
79 == Node level fencing devices
81 Before we get into the configuration details, you need to pick a
82 fencing device for the node level fencing. There are quite a few
83 to choose from. If you want to see the list of stonith devices
84 which are supported just run:
86 stonith -L
88 Stonith devices may be classified into five categories:
90 - UPS (Uninterruptible Power Supply)
92 - PDU (Power Distribution Unit)
94 - Blade power control devices
96 - Lights-out devices
98 - Testing devices
100 The choice depends mainly on your budget and the kind of
101 hardware. For instance, if you're running a cluster on a set of
102 blades, then the power control device in the blade enclosure is
103 the only candidate for fencing. Of course, this device must be
104 capable of managing single blade computers.
106 The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming
107 increasingly popular and in future they may even become standard
108 equipment of of-the-shelf computers. They are, however, inferior
109 to UPS devices, because they share a power supply with their host
110 (a cluster node). If a node stays without power, the device
111 supposed to control it would be just as useless. Even though this
112 is obvious to us, the cluster manager is not in the know and will
113 try to fence the node in vain. This will continue forever because
114 all other resource operations would wait for the fencing/stonith
115 operation to succeed.
117 The testing devices are used exclusively for testing purposes.
118 They are usually more gentle on the hardware. Once the cluster
119 goes into production, they must be replaced with real fencing
122 == STONITH (Shoot The Other Node In The Head)
124 Stonith is our fencing implementation. It provides the node level
128 The stonith and fencing terms are often used
129 interchangeably here as well as in other texts.
131 The stonith subsystem consists of two components:
133 - pacemaker-fenced
135 - stonith plugins
137 === pacemaker-fenced
139 pacemaker-fenced is a daemon which may be accessed by the local processes
140 or over the network. It accepts commands which correspond to
141 fencing operations: reset, power-off, and power-on. It may also
142 check the status of the fencing device.
144 pacemaker-fenced runs on every node in the CRM HA cluster. The
145 pacemaker-fenced instance running on the DC node receives a fencing
146 request from the CRM. It is up to this and other pacemaker-fenced
147 programs to carry out the desired fencing operation.
149 === Stonith plugins
151 For every supported fencing device there is a stonith plugin
152 which is capable of controlling that device. A stonith plugin is
153 the interface to the fencing device. All stonith plugins look the
154 same to pacemaker-fenced, but are quite different on the other side
155 reflecting the nature of the fencing device.
157 Some plugins support more than one device. A typical example is
158 ipmilan (or external/ipmi) which implements the IPMI protocol and
159 can control any device which supports this protocol.
161 == CRM stonith configuration
163 The fencing configuration consists of one or more stonith
166 A stonith resource is a resource of class stonith and it is
167 configured just like any other resource. The list of parameters
168 (attributes) depend on and are specific to a stonith type. Use
169 the stonith(1) program to see the list:
171 $ stonith -t ibmhmc -n
173 $ stonith -t ipmilan -n
174 hostname ipaddr port auth priv login password reset_method
177 It is easy to guess the class of a fencing device from
178 the set of attribute names.
180 A short help text is also available:
182 $ stonith -t ibmhmc -h
183 STONITH Device: ibmhmc - IBM Hardware Management Console (HMC)
184 Use for IBM i5, p5, pSeries and OpenPower systems managed by HMC
185 Optional parameter name managedsyspat is white-space delimited
186 list of patterns used to match managed system names; if last
187 character is '*', all names that begin with the pattern are matched
188 Optional parameter name password is password for hscroot if
189 passwordless ssh access to HMC has NOT been setup (to do so,
190 it is necessary to create a public/private key pair with
191 empty passphrase - see "Configure the OpenSSH client" in the
192 redbook for more details)
193 For more information see
196 .You just said that there is pacemaker-fenced and stonith plugins. What's with these resources now?
198 Resources of class stonith are just a representation of stonith
199 plugins in the CIB. Well, a bit more: apart from the fencing
200 operations, the stonith resources, just like any other, may be
201 started and stopped and monitored. The start and stop operations
202 are a bit of a misnomer: enable and disable would serve better,
203 but it's too late to change that. So, these two are actually
204 administrative operations and do not translate to any operation
205 on the fencing device itself. Monitor, however, does translate to
206 device status.
209 A dummy stonith resource configuration, which may be used in some
210 testing scenarios is very simple:
213 primitive st-null stonith:null \
214 params hostlist="node1 node2"
215 clone fencing st-null
220 All configuration examples are in the crm configuration tool
221 syntax. To apply them, put the sample in a text file, say
222 sample.txt and run:
224 crm < sample.txt
226 The configure and commit lines are omitted from further examples.
229 An alternative configuration:
231 primitive st-node1 stonith:null \
232 params hostlist="node1"
233 primitive st-node2 stonith:null \
234 params hostlist="node2"
235 location l-st-node1 st-node1 -inf: node1
236 location l-st-node2 st-node2 -inf: node2
238 This configuration is perfectly alright as far as the cluster
239 software is concerned. The only difference to a real world
240 configuration is that no fencing operation takes place.
242 A more realistic, but still only for testing, is the following
243 external/ssh configuration:
245 primitive st-ssh stonith:external/ssh \
246 params hostlist="node1 node2"
247 clone fencing st-ssh
249 This one can also reset nodes. As you can see, this configuration
250 is remarkably similar to the first one which features the null
251 stonith device.
253 .What is this clone thing?
255 Clones are a CRM/Pacemaker feature. A clone is basically a
256 shortcut: instead of defining _n_ identical, yet differently named
257 resources, a single cloned resource suffices. By far the most
258 common use of clones is with stonith resources if the stonith
259 device is accessible from all nodes.
262 The real device configuration is not much different, though some
263 devices may require more attributes. For instance, an IBM RSA
264 lights-out device might be configured like this:
266 primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \
267 params nodename=node1 ipaddr=192.168.0.101 \
268 userid=USERID passwd=PASSW0RD
269 primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \
270 params nodename=node2 ipaddr=192.168.0.102 \
271 userid=USERID passwd=PASSW0RD
272 # st-ibmrsa-1 can run anywhere but on node1
273 location l-st-node1 st-ibmrsa-1 -inf: node1
274 # st-ibmrsa-2 can run anywhere but on node2
275 location l-st-node2 st-ibmrsa-2 -inf: node2
277 .Why those strange location constraints?
279 There is always certain probability that the stonith operation is
280 going to fail. Hence, a stonith operation on the node which is
281 the executioner too is not reliable. If the node is reset, then
282 it cannot send the notification about the fencing operation
286 If you haven't already guessed, configuration of a UPS kind of
287 fencing device is remarkably similar to all we have already
290 All UPS devices employ the same mechanics for fencing. What is,
291 however, different is how the device itself is accessed. Old UPS
292 devices, those that were considered professional, used to have
293 just a serial port, typically connected at 1200baud using a
294 special serial cable. Many new ones still come equipped with a
295 serial port, but often they also sport a USB interface or an
296 Ethernet interface. The kind of connection we may make use of
297 depends on what the plugin supports. Let's see a few examples for
298 the APC UPS equipment:
300 $ stonith -t apcmaster -h
302 STONITH Device: apcmaster - APC MasterSwitch (via telnet)
303 NOTE: The APC MasterSwitch accepts only one (telnet)
304 connection/session a time. When one session is active,
305 subsequent attempts to connect to the MasterSwitch will fail.
306 For more information see http://www.apc.com/
307 List of valid parameter names for apcmaster STONITH device:
312 $ stonith -t apcsmart -h
314 STONITH Device: apcsmart - APC Smart UPS
315 (via serial port - NOT USB!).
316 Works with higher-end APC UPSes, like
317 Back-UPS Pro, Smart-UPS, Matrix-UPS, etc.
318 (Smart-UPS may have to be >= Smart-UPS 700?).
319 See http://www.networkupstools.org/protocols/apcsmart.html
320 for protocol compatibility details.
321 For more information see http://www.apc.com/
322 List of valid parameter names for apcsmart STONITH device:
326 The former plugin supports APC UPS with a network port and telnet
327 protocol. The latter plugin uses the APC SMART protocol over the
328 serial line which is supported by many different APC UPS product
331 .So, what do I use: clones, constraints, both?
333 It depends. Depends on the nature of the fencing device. For
334 example, if the device cannot serve more than one connection at
335 the time, then clones won't do. Depends on how many hosts can the
336 device manage. If it's only one, and that is always the case with
337 lights-out devices, then again clones are right out. Depends
338 also on the number of nodes in your cluster: the more nodes the
339 more desirable to use clones. Finally, it is also a matter of
340 personal preference.
342 In short: if clones are safe to use with your configuration and
343 if they reduce the configuration, then make cloned stonith
347 The CRM configuration is left as an exercise to the reader.
349 == Monitoring the fencing devices
351 Just like any other resource, the stonith class agents also
352 support the monitor operation. Given that we have often seen
353 monitor either not configured or configured in a wrong way, we
354 have decided to devote a section to the matter.
356 Monitoring stonith resources, which is actually checking status
357 of the corresponding fencing devices, is strongly recommended. So
358 strongly, that we should consider a configuration without it
361 On the one hand, though an indispensable part of an HA cluster, a
362 fencing device, being the last line of defense, is used seldom.
363 Very seldom and preferably never. On the other, for whatever
364 reason, the power management equipment is known to be rather
365 fragile on the communication side. Some devices were known to
366 give up if there was too much broadcast traffic on the wire. Some
367 cannot handle more than ten or so connections per minute. Some
368 get confused or depressed if two clients try to connect at the
369 same time. Most cannot handle more than one session at the time.
370 The bottom line: try not to exercise your fencing device too
371 often. It may not like it. Use monitoring regularly, yet
372 sparingly, say once every couple of hours. The probability that
373 within those few hours there will be a need for a fencing
374 operation and that the power switch would fail is usually low.
376 == Odd plugins
378 Apart from plugins which handle real devices, some stonith
379 plugins are a bit out of line and deserve special attention.
381 === external/kdumpcheck
383 Sometimes, it may be important to get a kernel core dump. This
384 plugin may be used to check if the dump is in progress. If
385 that is the case, then it will return true, as if the node has
386 been fenced, which is actually true given that it cannot run
387 any resources at the time. kdumpcheck is typically used in
388 concert with another, real, fencing device. See
389 README_kdumpcheck.txt for more details.
391 === external/sbd
393 This is a self-fencing device. It reacts to a so-called "poison
394 pill" which may be inserted into a shared disk. On shared storage
395 connection loss, it also makes the node commit suicide. See
396 http://www.linux-ha.org/wiki/SBD_Fencing for more details.
398 === meatware
400 Strange name and a simple concept. `meatware` requires help from a
401 human to operate. Whenever invoked, `meatware` logs a CRIT severity
402 message which should show up on the node's console. The operator
403 should then make sure that the node is down and issue a
404 `meatclient(8)` command to tell `meatware` that it's OK to tell the
405 cluster that it may consider the node dead. See `README.meatware`
406 for more information.
408 === null
410 This one is probably not of much importance to the general
411 public. It is used in various testing scenarios. `null` is an
412 imaginary device which always behaves and always claims that it
413 has shot a node, but never does anything. Sort of a
414 happy-go-lucky. Do not use it unless you know what you are doing.
416 === suicide
418 `suicide` is a software-only device, which can reboot a node it is
419 running on. It depends on the operating system, so it should be
420 avoided whenever possible. But it is OK on one-node clusters.
421 `suicide` and `null` are the only exceptions to the "don't shoot my
422 host" rule.
424 .What about that pacemaker-fenced? You forgot about it, eh?
426 The pacemaker-fenced daemon, though it is really the master of ceremony,
427 requires no configuration itself. All configuration is stored in
428 the CIB.
431 == Resources