"Fossies" - the Fresh Open Source Software Archive

Member "opensaf-5.21.09/src/imm/README.2PBE" (31 May 2021, 10365 Bytes) of package /linux/misc/opensaf-5.21.09.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 2PBE Allow IMM PBE to be configured without shared file system (4.4)
    2 ===================================================================
    3 https://sourceforge.net/p/opensaf/tickets/21/
    4 
    5 The 2PBE enhancement allows the IMM to have PBE configured so that it
    6 does not rely on a shared filesystem, such as DRBD.
    7 
    8 Executing IMM without PBE configured or enabled (0PBE) of course also
    9 makes the IMM not rely on any shared filesystem, but you then do not get
   10 automatic incremental persistence. Deployments that rarely update the
   11 configuration data and rarely alter the "administrative state" on AMF data,
   12 should still consider the option of running without PBE. Persistence is
   13 then acheived by performing an explicit dump after each CCB, after each
   14 admin-op to change administrative state, or at least after a set of such
   15 persistent changes have been completed. This is the simplest configuration
   16 with least overhead and lowest resource consumption.
   17 
   18 Regular 1PBE uses one Persistent Back End process started and controlled
   19 by the IMMND coordinator.
   20 
   21 With 2PBE a PBE process is started at *both* SCs. The PBE started by the
   22 coordinator is called the *primary* PBE and operates in much the same way
   23 as the regular 1PBE process, except it also synchronizes with the PBE at
   24 the other SC. The PBE started by the non-coord but SC resident IMMND is
   25 called the *slave* PBE.
   26 
   27 The primary PBE and the slave PBE each write to what should normally be a
   28 local database file. The database file has the same basic name as used by
   29 regular 1PBE (IMMSV_PBE_FILE in immnd.conf), except that a suffix is appended
   30 that consists of the processors node-id, as defined by MDS. This suffix
   31 *allows* 2PBE to be execued with the PBE files actually residing on a shared
   32 file system. That would not be a good solution for deployment, but it may
   33 simplify some test framweworks, that can then use the same file system
   34 configuration for testing both 1PBE and 2PBE.
   35 
   36 Configuring 2PBE
   37 ----------------
   38 The configure for 2PBE, first configure the same parameters as used by 1PBE,
   39 although IMMSV_ROOT_DIRECTORY would for 2PBE normally point at a local file
   40 system directory. Then also comment in the following parameter in immd.conf
   41 (note immd.conf, not immnd.conf):
   42 
   43   #export IMMSV_2PBE_PEER_SC_MAX_WAIT=30
   44 
   45 This is the only 2PBE specific parameter. It has to be defined on both SCs
   46 so that both the active and the standby IMMD become aware that 2PBE should
   47 be run. The value of this parameter is the number of seconds the active IMMD
   48 must wait for both SC IMMNDs to complete "preloading" (explaine below) before
   49 the active IMMD can choose one of these IMMNDs to become IMMND coordinator.
   50 
   51 Note that it is not normal, i.e. not expected, for any given cluster to switch
   52 between 1PBE and 2PBE. The decision to use 1PBE or 2PBE should be based on what
   53 is available and intended to be used as storage for the imm service. Typically
   54 that does not change during the service lifetime for a cluster. Thus the choice
   55 of 1PBE or 2PBE (or 0PBE) is normally done at cluster installation time.
   56 
   57 The cardinal example for a deployment that should use 2PBE (or 0PBE) is a down
   58 sized embedded system, that does not have any shared filesystem available.
   59 
   60 A 2PBE system can disable PBE and enable PBE in the same way that a 1PBE system
   61 can. This is done in the same way, using the administrative operation for this,
   62 see the regular section on PBE.
   63 
   64 Cluster-start & IMM loading from PBE-files for 2PBE
   65 ---------------------------------------------------
   66 The active IMMD will order each SC resident IMMND to execute a "preload", probing
   67 the SC local filesystem for the file state that *would* be loaded to the cluster
   68 if that SC IMMND was chosen as coord. The two SC IMMNDs send the preload stats to
   69 the active IMMD. On each SC, the file probe starts by attempting to open the
   70 database file with the 2PBE suffix (e.g. imm.db.2010f at SC1 and imm.db.2020f at
   71 SC2). If the file with suffix does not exist (can not be opened), then a new probe
   72 is tried using the database file without suffix (e.g imm.db at both SC1 and SC2).
   73 If that probe also fails, then the last alternative is to try to open the
   74 IMMSV_LOAD_FILE (e.g. imm.xml).
   75 
   76 When initial starting a 2PBE system, take care so that at least the imm.xml file
   77 exists at both SCs. Otherwise there is a risk that the preloading will hang for
   78 quite some time, when the IMMND restarts at an SC.
   79 
   80 The active IMMD will wait for the IMMNDs at *both* SCs to complete the preload
   81 task and then determine which SC has the apparently latest file state. The IMMND
   82 at that SC will then be chosen as IMMND coord. Should the timeout be reached,
   83 then the active IMMD declares the only avaialble SC IMMND as coordinator. This
   84 should be avoided.
   85 
   86 Actual loading then proceeds in the same way as for regular 1PBE. The
   87 IMMSV_2PBE_PEER_SC_MAX_WAIT is by default 30 seconds. This value should
   88 be high enough to make it extremely unlikely that the active IMMD is forced
   89 choose coord/loader when only a single SC IMMND has joined. If that happens,
   90 then the risk is that the cluster restart will be done *not* using the latest
   91 persistent imm state, effectively rewinding the imm state. Normally the two
   92 PBE files should be identical and the choice of coord/loader then does not
   93 matter. But if hey are not identical, due to one SC having been down for some
   94 time before the cluster restart, then the choice of the SC to load from does
   95 matter. [Note: the same type of problem will happen with regular 1PBE based
   96 on a shared filesystem (DRBD) if one SC fails to come up in time to join the
   97 (DRBD) sync protocol. The corresponding DRBD timeout is on the order of 20
   98 seconds. Even if the other SC later joins, it will be too late because by that
   99 time the loading has probably been completed. Even if loading is still in
  100 progress, DRBD can not correct/mutate the PBE file while it is being read by
  101 the sqlite-lib for loading.]
  102 
  103 Normal processing with 2PBE
  104 ---------------------------
  105 When loading has completed, two PBEs will be started. The primary PBE at the
  106 SC with the IMMND coord and the slave PBE at the other SC.
  107 
  108 In the same way as for 1PBE, the primary PBE, is the transaction coordinator
  109 for CCB commits, PRT operations and class-create/deletes. Specific for 2PBE,
  110 the slave PBE is in essence a class applier for all configuration classes,
  111 recording the same data as the primary PBE, but on a file at the other SC.
  112 
  113 The primary PBE is thus the entity that decides the outcomme of a CCB that is
  114 in the crtitical state. [The critical state is defined by the commit request
  115 having been sent to the (primary) PBE]. If the pirmary PBE acks the commit, the
  116 CCB commits in imm-ram. Finally, all appliers that where tracking the CCB get
  117 the commit/apply callback, including the slave PBE.
  118 
  119 With 2PBE, *both* PBEs must be available for the imm to be persistent-writable.
  120 If one or both PBEs are unavailable (or unresponsive) then persistent writes
  121 (CCBs, PRT operations, class changes) will fail.
  122 
  123 In 2PBE, a restarted PBE (primary or slave) will more often need to regenerate
  124 its file (from imm-ram). On the other hand, regeneration of the file should be
  125 faster in 2PBE than in regular 1PBE because the file is typically placed on a
  126 local file system.
  127 
  128 OneSafe2PBE
  129 -----------
  130 If an SC is taken down by order from the operator, i.e. a controlled shutdown,
  131 then the operator can also (directly or indirectly) request that persistent
  132 writes be allowed despite only one PBE being availble in a 2PBE system. This is
  133 also typically needed for an uncontrolled and unexpected departure of an SC if
  134 that SC does not immediately bounce back up. A repair is then apparently needed
  135 and the system *must* be allowed to function with only one PBE, despite that it
  136 only writes to one local filesystem using one PBE.
  137 
  138 The 1safe2PBE mechanism allows a 2PBE OpenSAF cluster to open up for persistent
  139 writes using only one of the two PBEs - temporarily. This is only intended to be
  140 used as a temporary state when one SC is long term unavailable. As soon as the
  141 other SC returns, then the IMM will automatically re-enter normal 2-safe2PBE and
  142 will reject persistent writes and attempts to enter 1safe2PBE until the slave PBE
  143 has synced (regenerated its file) and rejoined the cluster.
  144 
  145 The 1safe2PBE state is entered by the administrative opeation:
  146 
  147   immadm -o 1 -a safImmService -p opensafImmNostdFlags:SA_UINT32_T:8 \
  148      opensafImm=opensafImm,safApp=safImmService
  149 
  150 It is exited either automatically by a rejoined SC or by an explicit administrative
  151 opertion:
  152 
  153   immadm -o 2 -a safImmService -p opensafImmNostdFlags:SA_UINT32_T:8 \
  154      opensafImm=opensafImm,safApp=safImmService
  155 
  156 Note the explicit setting of admin-owner-name using "-a safImmService". This should
  157 be used for these admin-operations because the imm service needs admin-ownership
  158 over the object "opensafImm=opensafImm,safApp=safImmService" in order for 2PBE to
  159 work properly.
  160 
  161 Hence the fourth bit in the opensafImmNostdFlags bitvector of the OpenSAF service
  162 object, is used to toggle on/off oneSafe2PBE. Toggling this bit, on older systems,
  163 or systems that do not have 2PBE configured, will have no effect. Toggling this bit
  164 on (on a 2PBE system) is only accepted by the IMM service when there is only one SC
  165 available.
  166 
  167 We reccommend that any deployment of OpenSAF that intends to allow usage of 2PBE,
  168 invoke the toggling on of this bit in the wrapper function for performing a planned
  169 stop of an SC. Note however that the operation will only succeed when the SC has gone
  170 down, i.e. there is only one SC available. Similarly, if there is any alarm generated
  171 when an SC has gone down and not come back up quickly enough (a node repair needed
  172 alarm), then we suggest that the alarm trigger the invocation of the admin op to
  173 toggle this flag on.
  174 
  175 For "normal", but unplanned, processor restarts, we recommend that this flag not be
  176 toggled on. This means that for such processor restarts, persistent writes will not be
  177 allowed untill both SCs are available again.
  178 
  179 2PBE with spares(#79 & #1925)
  180 ----------------------------
  181 From OpenSAF 5.0 the mds_register for IMMD is delayed until amfd comes up and gives role.
  182 Because of this the standby role is delayed. When spares are configured there is a chance 
  183 that chosen standby immnd can be from different node than the actual node which got 
  184 standby role. It is not recommended to configure 2PBE with spares.