"Fossies" - the Fresh Open Source Software Archive

Member "pacemaker-Pacemaker-2.1.2/doc/crm_fencing.txt" (24 Nov 2021, 16932 Bytes) of package /linux/misc/pacemaker-Pacemaker-2.1.2.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file. See also the last Fossies "Diffs" side-by-side code changes report for "crm_fencing.txt": 2.1.0_vs_2.1.1.

    1 Fencing and Stonith
    2 ===================
    3 Dejan_Muhamedagic <dejan@suse.de>
    4 v0.9
    5 
    6 Fencing is a very important concept in computer clusters for HA
    7 (High Availability). Unfortunately, given that fencing does not
    8 offer a visible service to users, it is often neglected.
    9 
   10 Fencing may be defined as a method to bring an HA cluster to a
   11 known state. But, what is a "cluster state" after all? To answer
   12 that question we have to see what is in the cluster.
   13 
   14 == Introduction to HA clusters
   15 
   16 Any computer cluster may be loosely defined as a collection of
   17 cooperating computers or nodes. Nodes talk to each other over
   18 communication channels, which are typically standard network
   19 connections, such as Ethernet. 
   20 
   21 The main purpose of an HA cluster is to manage user services.
   22 Typical examples of user services are an Apache web server or,
   23 say, a MySQL database. From the user's point of view, the
   24 services do some specific and hopefully useful work when ordered
   25 to do so. To the cluster, however, they are just things which may
   26 be started or stopped. This distinction is important, because the
   27 nature of the service is irrelevant to the cluster. In the
   28 cluster lingo, the user services are known as resources.
   29 
   30 Every resource has a state attached, for instance: "resource r1
   31 is started on node1". In an HA cluster, such state implies that
   32 "resource r1 is stopped on all nodes but node1", because an HA
   33 cluster must make sure that every resource may be started on at
   34 most one node.
   35 
   36 A collection of resource states and node states is the cluster
   37 state.
   38 
   39 Every node must report every change that happens to resources.
   40 This may happen only for the running resources, because a node
   41 should not start resources unless told so by somebody. That
   42 somebody is the Cluster Resource Manager (CRM) in our case.
   43 
   44 So far so good. But what if, for whatever reason, we cannot
   45 establish with certainty a state of some node or resource? This
   46 is where fencing comes in. With fencing, even when the cluster
   47 doesn't know what is happening on some node, we can make sure
   48 that that node doesn't run any or certain important resources.
   49 
   50 If you wonder how this can happen, there may be many risks
   51 involved with computing: reckless people, power outages, natural
   52 disasters, rodents, thieves, software bugs, just to name a few.
   53 We are sure that at least a few times your computer failed
   54 unpredictably.
   55 
   56 == Fencing
   57 
   58 There are two kinds of fencing: resource level and node level.
   59 
   60 Using the resource level fencing the cluster can make sure that
   61 a node cannot access one or more resources. One typical example
   62 is a SAN, where a fencing operation changes rules on a SAN switch
   63 to deny access from a node.
   64 
   65 The resource level fencing may be achieved using normal resources
   66 on which the resource we want to protect would depend. Such a
   67 resource would simply refuse to start on this node and therefore
   68 resources which depend on it will be unrunnable on the same node
   69 as well.
   70 
   71 The node level fencing makes sure that a node does not run any
   72 resources at all. This is usually done in a very simple, yet
   73 brutal way: the node is simply reset using a power switch. This
   74 may ultimately be necessary because the node may not be
   75 responsive at all.
   76 
   77 The node level fencing is our primary subject below.
   78 
   79 == Node level fencing devices
   80 
   81 Before we get into the configuration details, you need to pick a
   82 fencing device for the node level fencing. There are quite a few
   83 to choose from. If you want to see the list of stonith devices
   84 which are supported just run:
   85 
   86 	stonith -L
   87 
   88 Stonith devices may be classified into five categories:
   89 
   90 - UPS (Uninterruptible Power Supply)
   91 
   92 - PDU (Power Distribution Unit)
   93 
   94 - Blade power control devices
   95 
   96 - Lights-out devices
   97 
   98 - Testing devices
   99 
  100 The choice depends mainly on your budget and the kind of
  101 hardware. For instance, if you're running a cluster on a set of
  102 blades, then the power control device in the blade enclosure is
  103 the only candidate for fencing. Of course, this device must be
  104 capable of managing single blade computers.
  105 
  106 The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming
  107 increasingly popular and in future they may even become standard
  108 equipment of of-the-shelf computers. They are, however, inferior
  109 to UPS devices, because they share a power supply with their host
  110 (a cluster node). If a node stays without power, the device
  111 supposed to control it would be just as useless. Even though this
  112 is obvious to us, the cluster manager is not in the know and will
  113 try to fence the node in vain. This will continue forever because
  114 all other resource operations would wait for the fencing/stonith
  115 operation to succeed.
  116 
  117 The testing devices are used exclusively for testing purposes.
  118 They are usually more gentle on the hardware. Once the cluster
  119 goes into production, they must be replaced with real fencing
  120 devices.
  121 
  122 == STONITH (Shoot The Other Node In The Head)
  123 
  124 Stonith is our fencing implementation. It provides the node level
  125 fencing.
  126 
  127 .NB
  128 The stonith and fencing terms are often used
  129 interchangeably here as well as in other texts.
  130 
  131 The stonith subsystem consists of two components:
  132 
  133 - pacemaker-fenced
  134 
  135 - stonith plugins
  136 
  137 === pacemaker-fenced
  138 
  139 pacemaker-fenced is a daemon which may be accessed by the local processes
  140 or over the network. It accepts commands which correspond to
  141 fencing operations: reset, power-off, and power-on.  It may also
  142 check the status of the fencing device.
  143 
  144 pacemaker-fenced runs on every node in the CRM HA cluster. The
  145 pacemaker-fenced instance running on the DC node receives a fencing
  146 request from the CRM. It is up to this and other pacemaker-fenced
  147 programs to carry out the desired fencing operation.
  148 
  149 === Stonith plugins
  150 
  151 For every supported fencing device there is a stonith plugin
  152 which is capable of controlling that device. A stonith plugin is
  153 the interface to the fencing device. All stonith plugins look the
  154 same to pacemaker-fenced, but are quite different on the other side
  155 reflecting the nature of the fencing device.
  156 
  157 Some plugins support more than one device. A typical example is
  158 ipmilan (or external/ipmi) which implements the IPMI protocol and
  159 can control any device which supports this protocol.
  160 
  161 == CRM stonith configuration
  162 
  163 The fencing configuration consists of one or more stonith
  164 resources.
  165 
  166 A stonith resource is a resource of class stonith and it is
  167 configured just like any other resource. The list of parameters
  168 (attributes) depend on and are specific to a stonith type. Use
  169 the stonith(1) program to see the list:
  170 
  171 	$ stonith -t ibmhmc -n
  172 	ipaddr
  173 	$ stonith -t ipmilan -n
  174 	hostname  ipaddr  port  auth  priv  login  password reset_method
  175 
  176 .NB
  177 It is easy to guess the class of a fencing device from
  178 the set of attribute names.
  179 
  180 A short help text is also available:
  181 
  182 	$ stonith -t ibmhmc -h
  183 	STONITH Device: ibmhmc - IBM Hardware Management Console (HMC)
  184 	Use for IBM i5, p5, pSeries and OpenPower systems managed by HMC
  185 	  Optional parameter name managedsyspat is white-space delimited
  186 	list of patterns used to match managed system names; if last
  187 	character is '*', all names that begin with the pattern are matched
  188 	  Optional parameter name password is password for hscroot if
  189 	passwordless ssh access to HMC has NOT been setup (to do so,
  190 	it is necessary to create a public/private key pair with
  191 	empty passphrase - see "Configure the OpenSSH client" in the
  192 	redbook for more details)
  193 	For more information see
  194 	http://publib-b.boulder.ibm.com/redbooks.nsf/RedbookAbstracts/SG247038.html
  195 
  196 .You just said that there is pacemaker-fenced and stonith plugins. What's with these resources now?
  197 **************************
  198 Resources of class stonith are just a representation of stonith
  199 plugins in the CIB. Well, a bit more: apart from the fencing
  200 operations, the stonith resources, just like any other, may be
  201 started and stopped and monitored. The start and stop operations
  202 are a bit of a misnomer: enable and disable would serve better,
  203 but it's too late to change that. So, these two are actually
  204 administrative operations and do not translate to any operation
  205 on the fencing device itself. Monitor, however, does translate to
  206 device status.
  207 **************************
  208 
  209 A dummy stonith resource configuration, which may be used in some
  210 testing scenarios is very simple:
  211 
  212 	configure
  213 	primitive st-null stonith:null \
  214 		params hostlist="node1 node2"
  215 	clone fencing st-null
  216 	commit
  217 
  218 .NB
  219 **************************
  220 All configuration examples are in the crm configuration tool
  221 syntax. To apply them, put the sample in a text file, say
  222 sample.txt and run:
  223 
  224 	crm < sample.txt
  225 
  226 The configure and commit lines are omitted from further examples.
  227 **************************
  228 
  229 An alternative configuration:
  230 
  231 	primitive st-node1 stonith:null \
  232 		params hostlist="node1"
  233 	primitive st-node2 stonith:null \
  234 		params hostlist="node2"
  235 	location l-st-node1 st-node1 -inf: node1
  236 	location l-st-node2 st-node2 -inf: node2
  237 
  238 This configuration is perfectly alright as far as the cluster
  239 software is concerned. The only difference to a real world
  240 configuration is that no fencing operation takes place.
  241 
  242 A more realistic, but still only for testing, is the following
  243 external/ssh configuration:
  244 
  245 	primitive st-ssh stonith:external/ssh \
  246 		params hostlist="node1 node2"
  247 	clone fencing st-ssh
  248 
  249 This one can also reset nodes. As you can see, this configuration
  250 is remarkably similar to the first one which features the null
  251 stonith device.
  252 
  253 .What is this clone thing?
  254 **************************
  255 Clones are a CRM/Pacemaker feature. A clone is basically a
  256 shortcut: instead of defining _n_ identical, yet differently named
  257 resources, a single cloned resource suffices. By far the most
  258 common use of clones is with stonith resources if the stonith
  259 device is accessible from all nodes.
  260 **************************
  261 
  262 The real device configuration is not much different, though some
  263 devices may require more attributes. For instance, an IBM RSA
  264 lights-out device might be configured like this:
  265 
  266 	primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \
  267 		params nodename=node1 ipaddr=192.168.0.101 \
  268 		userid=USERID passwd=PASSW0RD
  269 	primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \
  270 		params nodename=node2 ipaddr=192.168.0.102 \
  271 		userid=USERID passwd=PASSW0RD
  272 	# st-ibmrsa-1 can run anywhere but on node1
  273 	location l-st-node1 st-ibmrsa-1 -inf: node1
  274 	# st-ibmrsa-2 can run anywhere but on node2
  275 	location l-st-node2 st-ibmrsa-2 -inf: node2
  276 
  277 .Why those strange location constraints?
  278 **************************
  279 There is always certain probability that the stonith operation is
  280 going to fail. Hence, a stonith operation on the node which is
  281 the executioner too is not reliable. If the node is reset, then
  282 it cannot send the notification about the fencing operation
  283 outcome.
  284 **************************
  285 
  286 If you haven't already guessed, configuration of a UPS kind of
  287 fencing device is remarkably similar to all we have already
  288 shown.
  289 
  290 All UPS devices employ the same mechanics for fencing. What is,
  291 however, different is how the device itself is accessed. Old UPS
  292 devices, those that were considered professional, used to have
  293 just a serial port, typically connected at 1200baud using a
  294 special serial cable. Many new ones still come equipped with a
  295 serial port, but often they also sport a USB interface or an
  296 Ethernet interface. The kind of connection we may make use of
  297 depends on what the plugin supports. Let's see a few examples for
  298 the APC UPS equipment:
  299 
  300 	$ stonith -t apcmaster -h
  301 
  302 	STONITH Device: apcmaster - APC MasterSwitch (via telnet)
  303 	NOTE: The APC MasterSwitch accepts only one (telnet)
  304 	connection/session a time. When one session is active,
  305 	subsequent attempts to connect to the MasterSwitch will fail.
  306 	For more information see http://www.apc.com/
  307 	List of valid parameter names for apcmaster STONITH device:
  308 	        ipaddr
  309 			login
  310 			password
  311 
  312 	$ stonith -t apcsmart -h
  313 
  314 	STONITH Device: apcsmart - APC Smart UPS
  315 	 (via serial port - NOT USB!). 
  316 	 Works with higher-end APC UPSes, like
  317 	 Back-UPS Pro, Smart-UPS, Matrix-UPS, etc.
  318 	 (Smart-UPS may have to be >= Smart-UPS 700?).
  319 	 See http://www.networkupstools.org/protocols/apcsmart.html
  320 	 for protocol compatibility details.
  321 	For more information see http://www.apc.com/
  322 	List of valid parameter names for apcsmart STONITH device:
  323 			ttydev
  324 			hostlist
  325 
  326 The former plugin supports APC UPS with a network port and telnet
  327 protocol. The latter plugin uses the APC SMART protocol over the
  328 serial line which is supported by many different APC UPS product
  329 lines.
  330 
  331 .So, what do I use: clones, constraints, both?
  332 **************************
  333 It depends. Depends on the nature of the fencing device. For
  334 example, if the device cannot serve more than one connection at
  335 the time, then clones won't do. Depends on how many hosts can the
  336 device manage. If it's only one, and that is always the case with
  337 lights-out devices, then again clones are right out. Depends
  338 also on the number of nodes in your cluster: the more nodes the
  339 more desirable to use clones. Finally, it is also a matter of
  340 personal preference.
  341 
  342 In short: if clones are safe to use with your configuration and
  343 if they reduce the configuration, then make cloned stonith
  344 resources.
  345 **************************
  346 
  347 The CRM configuration is left as an exercise to the reader.
  348 
  349 == Monitoring the fencing devices
  350 
  351 Just like any other resource, the stonith class agents also
  352 support the monitor operation. Given that we have often seen
  353 monitor either not configured or configured in a wrong way, we
  354 have decided to devote a section to the matter.
  355 
  356 Monitoring stonith resources, which is actually checking status
  357 of the corresponding fencing devices, is strongly recommended. So
  358 strongly, that we should consider a configuration without it
  359 invalid.
  360 
  361 On the one hand, though an indispensable part of an HA cluster, a
  362 fencing device, being the last line of defense, is used seldom.
  363 Very seldom and preferably never. On the other, for whatever
  364 reason, the power management equipment is known to be rather
  365 fragile on the communication side. Some devices were known to
  366 give up if there was too much broadcast traffic on the wire. Some
  367 cannot handle more than ten or so connections per minute. Some
  368 get confused or depressed if two clients try to connect at the
  369 same time. Most cannot handle more than one session at the time.
  370 The bottom line: try not to exercise your fencing device too
  371 often. It may not like it. Use monitoring regularly, yet
  372 sparingly, say once every couple of hours. The probability that
  373 within those few hours there will be a need for a fencing
  374 operation and that the power switch would fail is usually low.
  375 
  376 == Odd plugins
  377 
  378 Apart from plugins which handle real devices, some stonith
  379 plugins are a bit out of line and deserve special attention.
  380 
  381 === external/kdumpcheck
  382 
  383 Sometimes, it may be important to get a kernel core dump. This
  384 plugin may be used to check if the dump is in progress. If
  385 that is the case, then it will return true, as if the node has
  386 been fenced, which is actually true given that it cannot run
  387 any resources at the time. kdumpcheck is typically used in
  388 concert with another, real, fencing device. See
  389 README_kdumpcheck.txt for more details.
  390 
  391 === external/sbd
  392 
  393 This is a self-fencing device. It reacts to a so-called "poison
  394 pill" which may be inserted into a shared disk. On shared storage
  395 connection loss, it also makes the node commit suicide. See
  396 http://www.linux-ha.org/wiki/SBD_Fencing for more details.
  397 
  398 === meatware
  399 
  400 Strange name and a simple concept. `meatware` requires help from a
  401 human to operate. Whenever invoked, `meatware` logs a CRIT severity
  402 message which should show up on the node's console. The operator
  403 should then make sure that the node is down and issue a
  404 `meatclient(8)` command to tell `meatware` that it's OK to tell the
  405 cluster that it may consider the node dead. See `README.meatware`
  406 for more information.
  407 
  408 === null
  409 
  410 This one is probably not of much importance to the general
  411 public. It is used in various testing scenarios. `null` is an
  412 imaginary device which always behaves and always claims that it
  413 has shot a node, but never does anything. Sort of a
  414 happy-go-lucky. Do not use it unless you know what you are doing.
  415 
  416 === suicide
  417 
  418 `suicide` is a software-only device, which can reboot a node it is
  419 running on. It depends on the operating system, so it should be
  420 avoided whenever possible. But it is OK on one-node clusters.
  421 `suicide` and `null` are the only exceptions to the "don't shoot my
  422 host" rule.
  423 
  424 .What about that pacemaker-fenced? You forgot about it, eh?
  425 **************************
  426 The pacemaker-fenced daemon, though it is really the master of ceremony,
  427 requires no configuration itself. All configuration is stored in
  428 the CIB.
  429 **************************
  430 
  431 == Resources
  432 
  433 http://www.linux-ha.org/wiki/STONITH
  434 
  435 https://www.clusterlabs.org/doc/crm_fencing.html
  436 
  437 https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/
  438 
  439 http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html