"Fossies" - the Fresh Open Source Software Archive
Member "opensaf-5.21.09/src/imm/README.2PBE" (31 May 2021, 10365 Bytes) of package /linux/misc/opensaf-5.21.09.tar.gz:
As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard
) with prefixed line numbers.
Alternatively you can here view
the uninterpreted source code file.
1 2PBE Allow IMM PBE to be configured without shared file system (4.4)
5 The 2PBE enhancement allows the IMM to have PBE configured so that it
6 does not rely on a shared filesystem, such as DRBD.
8 Executing IMM without PBE configured or enabled (0PBE) of course also
9 makes the IMM not rely on any shared filesystem, but you then do not get
10 automatic incremental persistence. Deployments that rarely update the
11 configuration data and rarely alter the "administrative state" on AMF data,
12 should still consider the option of running without PBE. Persistence is
13 then acheived by performing an explicit dump after each CCB, after each
14 admin-op to change administrative state, or at least after a set of such
15 persistent changes have been completed. This is the simplest configuration
16 with least overhead and lowest resource consumption.
18 Regular 1PBE uses one Persistent Back End process started and controlled
19 by the IMMND coordinator.
21 With 2PBE a PBE process is started at *both* SCs. The PBE started by the
22 coordinator is called the *primary* PBE and operates in much the same way
23 as the regular 1PBE process, except it also synchronizes with the PBE at
24 the other SC. The PBE started by the non-coord but SC resident IMMND is
25 called the *slave* PBE.
27 The primary PBE and the slave PBE each write to what should normally be a
28 local database file. The database file has the same basic name as used by
29 regular 1PBE (IMMSV_PBE_FILE in immnd.conf), except that a suffix is appended
30 that consists of the processors node-id, as defined by MDS. This suffix
31 *allows* 2PBE to be execued with the PBE files actually residing on a shared
32 file system. That would not be a good solution for deployment, but it may
33 simplify some test framweworks, that can then use the same file system
34 configuration for testing both 1PBE and 2PBE.
36 Configuring 2PBE
38 The configure for 2PBE, first configure the same parameters as used by 1PBE,
39 although IMMSV_ROOT_DIRECTORY would for 2PBE normally point at a local file
40 system directory. Then also comment in the following parameter in immd.conf
41 (note immd.conf, not immnd.conf):
43 #export IMMSV_2PBE_PEER_SC_MAX_WAIT=30
45 This is the only 2PBE specific parameter. It has to be defined on both SCs
46 so that both the active and the standby IMMD become aware that 2PBE should
47 be run. The value of this parameter is the number of seconds the active IMMD
48 must wait for both SC IMMNDs to complete "preloading" (explaine below) before
49 the active IMMD can choose one of these IMMNDs to become IMMND coordinator.
51 Note that it is not normal, i.e. not expected, for any given cluster to switch
52 between 1PBE and 2PBE. The decision to use 1PBE or 2PBE should be based on what
53 is available and intended to be used as storage for the imm service. Typically
54 that does not change during the service lifetime for a cluster. Thus the choice
55 of 1PBE or 2PBE (or 0PBE) is normally done at cluster installation time.
57 The cardinal example for a deployment that should use 2PBE (or 0PBE) is a down
58 sized embedded system, that does not have any shared filesystem available.
60 A 2PBE system can disable PBE and enable PBE in the same way that a 1PBE system
61 can. This is done in the same way, using the administrative operation for this,
62 see the regular section on PBE.
64 Cluster-start & IMM loading from PBE-files for 2PBE
66 The active IMMD will order each SC resident IMMND to execute a "preload", probing
67 the SC local filesystem for the file state that *would* be loaded to the cluster
68 if that SC IMMND was chosen as coord. The two SC IMMNDs send the preload stats to
69 the active IMMD. On each SC, the file probe starts by attempting to open the
70 database file with the 2PBE suffix (e.g. imm.db.2010f at SC1 and imm.db.2020f at
71 SC2). If the file with suffix does not exist (can not be opened), then a new probe
72 is tried using the database file without suffix (e.g imm.db at both SC1 and SC2).
73 If that probe also fails, then the last alternative is to try to open the
74 IMMSV_LOAD_FILE (e.g. imm.xml).
76 When initial starting a 2PBE system, take care so that at least the imm.xml file
77 exists at both SCs. Otherwise there is a risk that the preloading will hang for
78 quite some time, when the IMMND restarts at an SC.
80 The active IMMD will wait for the IMMNDs at *both* SCs to complete the preload
81 task and then determine which SC has the apparently latest file state. The IMMND
82 at that SC will then be chosen as IMMND coord. Should the timeout be reached,
83 then the active IMMD declares the only avaialble SC IMMND as coordinator. This
84 should be avoided.
86 Actual loading then proceeds in the same way as for regular 1PBE. The
87 IMMSV_2PBE_PEER_SC_MAX_WAIT is by default 30 seconds. This value should
88 be high enough to make it extremely unlikely that the active IMMD is forced
89 choose coord/loader when only a single SC IMMND has joined. If that happens,
90 then the risk is that the cluster restart will be done *not* using the latest
91 persistent imm state, effectively rewinding the imm state. Normally the two
92 PBE files should be identical and the choice of coord/loader then does not
93 matter. But if hey are not identical, due to one SC having been down for some
94 time before the cluster restart, then the choice of the SC to load from does
95 matter. [Note: the same type of problem will happen with regular 1PBE based
96 on a shared filesystem (DRBD) if one SC fails to come up in time to join the
97 (DRBD) sync protocol. The corresponding DRBD timeout is on the order of 20
98 seconds. Even if the other SC later joins, it will be too late because by that
99 time the loading has probably been completed. Even if loading is still in
100 progress, DRBD can not correct/mutate the PBE file while it is being read by
101 the sqlite-lib for loading.]
103 Normal processing with 2PBE
105 When loading has completed, two PBEs will be started. The primary PBE at the
106 SC with the IMMND coord and the slave PBE at the other SC.
108 In the same way as for 1PBE, the primary PBE, is the transaction coordinator
109 for CCB commits, PRT operations and class-create/deletes. Specific for 2PBE,
110 the slave PBE is in essence a class applier for all configuration classes,
111 recording the same data as the primary PBE, but on a file at the other SC.
113 The primary PBE is thus the entity that decides the outcomme of a CCB that is
114 in the crtitical state. [The critical state is defined by the commit request
115 having been sent to the (primary) PBE]. If the pirmary PBE acks the commit, the
116 CCB commits in imm-ram. Finally, all appliers that where tracking the CCB get
117 the commit/apply callback, including the slave PBE.
119 With 2PBE, *both* PBEs must be available for the imm to be persistent-writable.
120 If one or both PBEs are unavailable (or unresponsive) then persistent writes
121 (CCBs, PRT operations, class changes) will fail.
123 In 2PBE, a restarted PBE (primary or slave) will more often need to regenerate
124 its file (from imm-ram). On the other hand, regeneration of the file should be
125 faster in 2PBE than in regular 1PBE because the file is typically placed on a
126 local file system.
130 If an SC is taken down by order from the operator, i.e. a controlled shutdown,
131 then the operator can also (directly or indirectly) request that persistent
132 writes be allowed despite only one PBE being availble in a 2PBE system. This is
133 also typically needed for an uncontrolled and unexpected departure of an SC if
134 that SC does not immediately bounce back up. A repair is then apparently needed
135 and the system *must* be allowed to function with only one PBE, despite that it
136 only writes to one local filesystem using one PBE.
138 The 1safe2PBE mechanism allows a 2PBE OpenSAF cluster to open up for persistent
139 writes using only one of the two PBEs - temporarily. This is only intended to be
140 used as a temporary state when one SC is long term unavailable. As soon as the
141 other SC returns, then the IMM will automatically re-enter normal 2-safe2PBE and
142 will reject persistent writes and attempts to enter 1safe2PBE until the slave PBE
143 has synced (regenerated its file) and rejoined the cluster.
145 The 1safe2PBE state is entered by the administrative opeation:
147 immadm -o 1 -a safImmService -p opensafImmNostdFlags:SA_UINT32_T:8 \
150 It is exited either automatically by a rejoined SC or by an explicit administrative
153 immadm -o 2 -a safImmService -p opensafImmNostdFlags:SA_UINT32_T:8 \
156 Note the explicit setting of admin-owner-name using "-a safImmService". This should
157 be used for these admin-operations because the imm service needs admin-ownership
158 over the object "opensafImm=opensafImm,safApp=safImmService" in order for 2PBE to
159 work properly.
161 Hence the fourth bit in the opensafImmNostdFlags bitvector of the OpenSAF service
162 object, is used to toggle on/off oneSafe2PBE. Toggling this bit, on older systems,
163 or systems that do not have 2PBE configured, will have no effect. Toggling this bit
164 on (on a 2PBE system) is only accepted by the IMM service when there is only one SC
167 We reccommend that any deployment of OpenSAF that intends to allow usage of 2PBE,
168 invoke the toggling on of this bit in the wrapper function for performing a planned
169 stop of an SC. Note however that the operation will only succeed when the SC has gone
170 down, i.e. there is only one SC available. Similarly, if there is any alarm generated
171 when an SC has gone down and not come back up quickly enough (a node repair needed
172 alarm), then we suggest that the alarm trigger the invocation of the admin op to
173 toggle this flag on.
175 For "normal", but unplanned, processor restarts, we recommend that this flag not be
176 toggled on. This means that for such processor restarts, persistent writes will not be
177 allowed untill both SCs are available again.
179 2PBE with spares(#79 & #1925)
181 From OpenSAF 5.0 the mds_register for IMMD is delayed until amfd comes up and gives role.
182 Because of this the standby role is delayed. When spares are configured there is a chance
183 that chosen standby immnd can be from different node than the actual node which got
184 standby role. It is not recommended to configure 2PBE with spares.