Difference: CorosyncSinglePrimaryConfiguration (1 vs. 10)

Revision 102014-07-01 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Added:
>
>
Archived 20-Sep-2013: The high-availability cluster has been set aside in favor of a more traditional single-box admin server. HA is grand in theory, but in the three years we operated the cluster we had no hardware problems which the HA set-up would have prevented, but many hours of downtime due to problems with the HA software. This mailing-list post has some details.
 This is a reference page. It contains a text file that describes how the high-availability Pacemaker/Corosync configuration was set up on two administrative servers, hypatia and orestes.

Files

Revision 92012-08-11 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 544 to 544
  cib commit condor quit \ No newline at end of file
Added:
>
>
META TOPICMOVED by="WilliamSeligman" date="1344648737" from="Nevis.CorosyncConfiguration" to="Nevis.CorosyncSinglePrimaryConfiguration"

Revision 82012-06-13 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 388 to 388
  # Give the virtual machine a long stop interval before flagging an error. # Sometimes it takes a while for Linux to shut down.

Changed:
<
<
configure primitive Hogwarts ocf:heartbeat:Filesystem params
>
>
configure primitive Hogwarts ocf:heartbeat:Xen params
  xmfile="/xen/configs/Hogwarts.cfg" op stop interval="0" timeout="240"

Revision 72011-10-17 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 114 to 114
 # and make all the changes to that copy; the other machine will have the # "slave" copy and mindlessly duplicate all the changes.
Added:
>
>
# I previously configured the DRBD resources 'admin' and 'work'. What the # following commands do is put the maintenance of these resources under # the control of Pacemaker.
 crm # Define a "shadow" configuration, to test things without committing them # to the HA cluster:
Line: 376 to 380
 # configuration to /xen/configs/hogwarts.cfg.

# I duplicated the same procedure for franklin (mail server), ada (web server), and

Changed:
<
<
# so on, but I don't know that here.
>
>
# so on, but I don't show that here.
  crm cib new hogwarts

Revision 62011-10-14 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 72 to 72
 

Configuration

Changed:
<
<
This work was done in Sep-2010. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.
>
>
This work was done in Sep-2010, with major revisions for stability in Aug-2011. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.
 
# The commands ultimately used to configure the high-availability (HA) servers:

Revision 52011-10-14 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 136 to 136
  configure colocation AdminWithMainIP inf: MainIPGroup Admin:Master
Added:
>
>
# We want to wait before assigning IPs to a node until we know that # Admin has been promoted to master on that node. configure order AdminBeforeMainIP inf: Admin:promote MainIPGroup
  # I like these commands, so commit them to the running configuration.

cib commit drbd

Line: 191 to 195
  # One more thing: It's important that we not try to mount the directory # until after Work has been promoted to master on the node.

Added:
>
>
# A score of "inf" means "infinity"; if the DRBD resource 'work' can't # be set up, then don't mount the /work partition.

  configure order WorkBeforeDirectory inf: Work:promote WorkDirectory:start

cib commit workdisk quit

Changed:
<
<
# We've made the relatively-unimportant work DRBD master function. Let's do it for real. # Prevously I created some LVM volumes on the admin DRBD master. We need to use a
>
>
# We've made the relatively-unimportant DRBD resource 'work' function. Let's do it for 'admin'. # Previously I created some LVM volumes on the admin DRBD master. We need to use a
 # resource to active them, but we can't activate them until after the Admin:Master # is loaded. crm
Line: 289 to 296
  # and demand that the cups service only start if /var/nevis (and the other # high-availability directories) have been mounted.

Deleted:
<
<
# A score of "inf" means "infinity"; if it can't be run on the # machine that mounted all the admin directories, it won't run at all.

  configure colocation CupsWithVar inf: Cups AdminDirectoriesGroup

# In order to prevent chaos, make sure that the high-availability directories

Line: 341 to 345
  configure colocation NfsStateWithVar inf: NfsStateDirectory AdminDirectoriesGroup configure order VarBeforeNfsState inf: AdminDirectoriesGroup NfsStateDirectory
Changed:
<
<
# Now that the NFS state directory is mounted, we can start the nfslockd. Note that
>
>
# Now that the NFS state directory is mounted, we can start nfslockd. Note that
  # that we're starting NFS lock on both the primary and secondary HA systems; # by default a "clone" resource is started on all systems in a cluster.
Added:
>
>
# (Placing nfslockd under the control of Pacemaker turned out to be key to # successful transfer of cluster services to another node. The nfslockd and # nfs daemon information stored in /var/lib/nfs have to be consistent.)
  configure primitive NfsLockInstance lsb:nfslock
Changed:
<
<
clone NfsLock NfsLockInstance
>
>
configure clone NfsLock NfsLockInstance
 
Changed:
<
<
# Once nfslockd has been set up, we can start NFS.
>
>
configure order NfsStateBeforeNfsLock inf: NfsStateDirectory NfsLock

# Once nfslockd has been set up, we can start NFS. (We say to colocate # NFS with 'NfsStateDirectory', instead of nfslockd, because nfslockd # is going to be started on both nodes.)

  configure primitive Nfs lsb:nfs configure colocation NfsWithNfsState inf: Nfs NfsStateDirectory
Changed:
<
<
configure order NfsStateBeforeNfs inf: NfsStateDirectory Nfs
>
>
configure order NfsLockBeforeNfs inf: NfsLock Nfs
  cib commit nfs quit
Line: 363 to 375
 # virtual machine. I previously created the hogwarts virtual machine and copied its # configuration to /xen/configs/hogwarts.cfg.
Added:
>
>
# I duplicated the same procedure for franklin (mail server), ada (web server), and # so on, but I don't know that here.
 crm cib new hogwarts

Line: 405 to 420
 # In general, there are many ways to implement a STONITH mechanism. At Nevis, the way # we do it is to shut-off the power on the UPS connected to the failed node.
Changed:
<
<
# (By the way, this is why you have to restart hypatia and orestes at the same time. # If you just restart one, the STONITH mechanism will cause the UPS on the restarting
>
>
# (By the way, this is why you have to be careful about restarting hypatia or orestes. # The STONITH mechanism may cause the UPS on the restarting
 # computer to turn off the power; it will never come back up.)

# At Nevis, the UPSes are monitored and controlled using the NUT package

Line: 466 to 481
 # For orestes to do this, it requires the condor service. It also requires that # library:/usr/nevis is mounted, the same as every other batch machine on the # Nevis condor cluster. We can't use the automount daemon (amd) to do this for
Changed:
<
<
# us, the way we do on the other batch nodes, so we have to make corosync do the
>
>
# us, the way we do on the other batch nodes; we have to make corosync do the
 # mounts.

crm cib new condor

Changed:
<
<
# Mount library:/usr/nevis
>
>
# Mount library:/usr/nevis. A bit of a name confusion here: there's a /work # partition on the primary node, but the name 'LibraryOnWork" means that # the nfs-mount of /usr/nevis is located on the secondary or "work" node.
  configure primitive LibraryOnWork ocf:heartbeat:Filesystem params device="library:/usr/nevis" directory="/usr/nevis"
Changed:
<
<
fstype="nfs" OCF_CHECK_LEVEL="20"
>
>
fstype="nfs"
 
Changed:
<
<
# Corosync must NOT mount library:/usr/nevis on the system has already
>
>
# Corosync must not NFS-mount library:/usr/nevis on the system has already
  # mounted /usr/nevis directly as part of AdminDirectoriesGroup # described above.

Line: 489 to 506
  configure colocation NoRemoteMountWithDirectories -inf: LibraryOnWork AdminDirectoriesGroup
Changed:
<
<
# Determine on which machine we mount library:/usr/nevis after we # figure out which machine is running AdminDirectoriesGroup.
>
>
# Determine on which machine we mount library:/usr/nevis after the NFS # export of /usr/nevis has been set up.
 
Changed:
<
<
configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork
>
>
configure order NfsBeforeLibrary inf: Nfs LibraryOnWork
  # Define the IPs associated with the backup system, and group them together. # This is a non-critical definition, and I don't want to assign it until the more important
Line: 515 to 532
  # If we're able mount library:/usr/nevis, then it's safe to start condor. # If we can't mount library:/usr/nevis, then condor will never be started.
Added:
>
>
# (We stated above that AssistantIPGroup won't start until after LibraryOnWork).
 
Changed:
<
<
configure colocation CondorWithLibrary inf: Condor LibraryOnWork

# library:/usr/nevis must be mounted before condor starts.

configure order LibraryBeforeCondor inf: LibraryOnWork Condor

>
>
configure colocation CondorWithAssistant inf: Condor AssistantIPGroup configure order AssistantBeforeCondor inf: AssistantIPGroup Condor
  cib commit condor quit

Revision 42011-10-13 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 108 to 108
 # test groups of commands before I commit them. (I omit the "configure show' # and "status" commands that I frequently typed in, in order to see that # everything was correct.)
Deleted:
<
<
crm # Define a "shadow" configuration, to test things without commiting them # to the HA cluster: cib new ip

# Define the IPs associated with the backup system, and group them together. configure primitive AssistantIP ocf:heartbeat:IPaddr2 params ip=129.236.252.10 cidr_netmask=32 op monitor interval=30s configure primitive AssistantLocalIP ocf:heartbeat:IPaddr2 params ip=10.44.7.10 cidr_netmask=32 op monitor interval=30s configure group AssistantIPGroup AssistantIP AssistantLocalIP

# Define a "colocation" = how much do you want these things together? # A score of -1000 means to try to keep them on separate machines as # much as possible, but allow them on the same machine if necessary.

configure colocation SeparateIPs -1000: MainIPGroup AssistantIPGroup

# I like these commands, so commit them to the running configuration.

cib commit ip quit

  # DRBD is a service that synchronizes the hard drives between two machines. # For our cluster, one machine will have access to the "master" copy
Line: 137 to 115
 # "slave" copy and mindlessly duplicate all the changes.

crm

Added:
>
>
# Define a "shadow" configuration, to test things without committing them # to the HA cluster:
  cib new drbd

# The "drbd_resource" parameter points to a configuration defined in /etc/drbd.d/

Line: 154 to 134
  # The machine that gets the master copy (the one that will make changes to the drive) # should also be the one with the main IP address.

Changed:
<
<
configure colocation AdminWithMainIP inf: Admin:Master MainIPGroup
>
>
configure colocation AdminWithMainIP inf: MainIPGroup Admin:Master

# I like these commands, so commit them to the running configuration.

  cib commit drbd

Line: 166 to 148
  configure master Work WorkDrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false

Changed:
<
<
# I prefer the work directory to be on the main admin box, but it doesn't have to be.
>
>
# I prefer the work directory to be on the main admin box, but it doesn't have to be. "500:" is # weighting factor; compare it to "inf:" (for infinity) which is used in most of these commands.
  configure colocation WorkPrefersMain 500: Work:Master MainIPGroup

Line: 358 to 341
  configure colocation NfsStateWithVar inf: NfsStateDirectory AdminDirectoriesGroup configure order VarBeforeNfsState inf: AdminDirectoriesGroup NfsStateDirectory
Changed:
<
<
# Once that directory has been set up, we can start NFS.
>
>
# Now that the NFS state directory is mounted, we can start the nfslockd. Note that # that we're starting NFS lock on both the primary and secondary HA systems; # by default a "clone" resource is started on all systems in a cluster.

configure primitive NfsLockInstance lsb:nfslock clone NfsLock NfsLockInstance

# Once nfslockd has been set up, we can start NFS.

  configure primitive Nfs lsb:nfs configure colocation NfsWithNfsState inf: Nfs NfsStateDirectory
Line: 504 to 494
  configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork
Added:
>
>
# Define the IPs associated with the backup system, and group them together. # This is a non-critical definition, and I don't want to assign it until the more important # "secondary" resources have been set up.

configure primitive Burr ocf:heartbeat:IPaddr2 params ip=129.236.252.10 cidr_netmask=32 op monitor interval=30s configure primitive BurrLocal ocf:heartbeat:IPaddr2 params ip=10.44.7.10 cidr_netmask=32 op monitor interval=30s configure group AssistantIPGroup Burr BurrLocal

colocation AssistantWithLibrary inf: AssistantIPGroup LibraryOnWork order LibraryBeforeAssistant inf: LibraryOnWork AssistantIPGroup

  # The standard condor execution service. As with all the batch nodes, # I've already configured /etc/condor/condor_config.local and created # scratch directories in /data/condor.

Revision 32010-10-12 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 22 to 22
 
crm configure show
Changed:
<
<
To get a constantly-updated display of the configuration, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
>
>
To get a constantly-updated display of the resource status, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
 
crm_mon
Line: 57 to 57
 # ... timeout = how to long wait before you assume a resource is dead.
Added:
>
>
How to find out which scripts exist, that is, which resources can be controlled by the HA cluster:
ra classes
Based on the result, I looked at:
ra list ocf heartbeat
To find out what IPaddr2 parameters I needed, I used:
ra meta ocf:heartbeat:IPaddr2
 

Configuration

This work was done in Sep-2010. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.

Added:
>
>
# The commands ultimately used to configure the high-availability (HA) servers:
 # The beginning: make sure corosync is running on both hypatia and orestes:

/sbin/service corosync start

Line: 116 to 131
  cib commit ip quit

Changed:
<
<
# DRBD is a service that syncronizes the hard drives between two machines.
>
>
# DRBD is a service that synchronizes the hard drives between two machines.
 # For our cluster, one machine will have access to the "master" copy # and make all the changes to that copy; the other machine will have the # "slave" copy and mindlessly duplicate all the changes.
Line: 162 to 177
  cib commit drbd quit
Changed:
<
<
# Now try a resource that depends on ordering: On the node that's has the master # resource for "work," mount that disk image on as /work.
>
>
# Now try a resource that depends on ordering: On the node that has the master # resource for "work," mount that disk image as /work.
 crm cib new workdisk

Line: 283 to 298
  configure primitive Cups lsb:cups

Changed:
<
<
# The print server must be associated with the main IP address. # A score of "inf" means "infinity"; if it can't be run on the # machine that's offering the main IP address, it won't run at all.

configure colocation CupsWithMainIP inf: Cups MainIPGroup

# But that's not the only requirement. Cups stores its spool files in # /var/spool/cups. If the cups service were to switch to a different server, # we want the new server to see the spools files. So create /var/nevis/cups, # link it with:

>
>
# Cups stores its spool files in /var/spool/cups. If the cups service # were to switch to a different server, we want the new server to see # the spooled files. So create /var/nevis/cups, link it with:
  # mv /var/spool/cups /var/spool/cups.ori # ln -sf /var/nevis/cups /var/spool/cups # and demand that the cups service only start if /var/nevis (and the other # high-availability directories) have been mounted.

Added:
>
>
# A score of "inf" means "infinity"; if it can't be run on the # machine that mounted all the admin directories, it won't run at all.

  configure colocation CupsWithVar inf: Cups AdminDirectoriesGroup

# In order to prevent chaos, make sure that the high-availability directories

Line: 327 to 338
  cib commit services quit
Changed:
<
<
# The high-availability servers export the /usr/nevis directory to all the # other machines on the Nevis Linux cluster. NFS exporting of a shared # directory can be a little tricky. As with CUPS spooling, we want to preserve # the NFS export state in a way that the backup server can pick it up. # The safest way to do this is to create a small separate LVM partition # ("nfs") and mount it as "/var/lib/nfs".
>
>
# The high-availability servers export some of the admin directories to other # systems, both real and virtual; for example, the /usr/nevis directory is # exported to all the other machines on the Nevis Linux cluster.

# NFS exporting of a shared directory can be a little tricky. As with CUPS # spooling, we want to preserve the NFS export state in a way that the # backup server can pick it up. The safest way to do this is to create a # small separate LVM partition ("nfs") and mount it as "/var/lib/nfs", # the NFS directory that contains files that keep track of the NFS state.

  crm cib new nfs
Line: 354 to 368
  quit

Changed:
<
<
# The whole point of this is to be able to run guest virtual machines under the # control of the high-availability service. Here is the set-up for one example
>
>
# The whole point of the entire setup is to be able to run guest virtual machines # under the control of the high-availability service. Here is the set-up for one example
 # virtual machine. I previously created the hogwarts virtual machine and copied its # configuration to /xen/configs/hogwarts.cfg.

crm cib new hogwarts

Added:
>
>
# Give the virtual machine a long stop interval before flagging an error. # Sometimes it takes a while for Linux to shut down.

configure primitive Hogwarts ocf:heartbeat:Filesystem params xmfile="/xen/configs/Hogwarts.cfg" op stop interval="0" timeout="240"

  # All the virtual machine files are stored in the /xen partition, which is one
Changed:
<
<
# of the high-availability admin directories. Make sure the directory is mounted # before starting the virtual machine.
>
>
# of the high-availability admin directories. The virtual machine must run on # the system with this directory.
 
Deleted:
<
<
configure primitive Hogwarts ocf:heartbeat:Filesystem params xmfile="/xen/configs/Hogwarts.cfg"
  configure colocation HogwartsWithDirectories inf: Hogwarts AdminDirectoriesGroup
Changed:
<
<
configure order DirectoriesBeforeHogwarts inf: AdminDirectoriesGroup Hogwarts
>
>
# All of the virtual machines depend on NFS-mounting directories which # are exported by the HA server. The safest thing to do is to make sure # NFS is running on the HA server before starting the virtual machine.

configure order NfsBeforeHogwarts inf: Nfs Hogwarts

  cib commit hogwarts quit
Line: 384 to 409
  # The STONITH mechanism means: If a node fails, the remaining node(s) in a cluster will # force a permanent shutdown of the failed node; it can't automatically come back up again.
Changed:
<
<
# This also known as "fencing": once a node fails, it can't be allowed to re-join the # cluster.
>
>
# This is a special case of "fencing": once a node or resource fails, it can't be allowed # to start up again automatically.
  # In general, there are many ways to implement a STONITH mechanism. At Nevis, the way # we do it is to shut-off the power on the UPS connected to the failed node.
Line: 475 to 500
  configure colocation NoRemoteMountWithDirectories -inf: LibraryOnWork AdminDirectoriesGroup

# Determine on which machine we mount library:/usr/nevis after we

Changed:
<
<
# figure out which machine is running AdminDirectoriesGroup. "symmetrical=false" # means that if we're turning off the resource for some reason, we don't # have to wait for LibraryOnWork to be stopped before we try to stop # AdminDirectoriesGroup (since these resources always run on different machines).
>
>
# figure out which machine is running AdminDirectoriesGroup.
 
Changed:
<
<
configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork symmetrical=false
>
>
configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork
  # The standard condor execution service. As with all the batch nodes, # I've already configured /etc/condor/condor_config.local and created

Revision 22010-09-30 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

This is a reference page. It contains a text file that describes how the high-availability Pacemaker/Corosync configuration was set up on two administrative servers, hypatia and orestes.

Changed:
<
<
This may help as you work your way through the configuration:
>
>

Files

Key HA configuration files. Note: Even in an emergency, there's no reason to edit these files:

/etc/drbd.conf
/etc/drbd.d/*.res
/etc/lvm/lvm.conf
/etc/corosync/corosync.conf
/home/bin/nut.sh
/home/bin/rsync-config.sh # Daily rsync from hypatia to orestes:

Commands

 
Added:
>
>
The configuration has definitely changed from that listed below. To see the current configuration, run this as root on either hypatia or orestes:
crm configure show
To get a constantly-updated display of the configuration, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
crm_mon
For a GUI, you can use this utility. You have to select "Connect" and login via an account that's a member of the haclient group; you may have to edit /etc/group.
 
Changed:
<
<
# Concepts: crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=192.168.85.3
>
>
crm_gui & You can run the above commands via sudo, but you'll have to extend your path; e.g.,
export PATH=/sbin:/usr/sbin:${PATH}
sudo crm_mon

Concepts

This may help as you work your way through the configuration:

crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=192.168.85.3 \
  cidr_netmask=32 op monitor interval=30s

# Which is composed of

Changed:
<
<
* crm ::= "corosync resource manager", the command we're executing
>
>
* crm ::= "cluster resource manager", the command we're executing
  * primitive ::= The type of resource object that we’re creating. * IP ::= Our name for the resource * IPaddr2 ::= The script to call
Line: 25 to 57
 # ... timeout = how to long wait before you assume a resource is dead.
Added:
>
>

Configuration

 This work was done in Sep-2010. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.

Revision 12010-09-28 - WilliamSeligman

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

This is a reference page. It contains a text file that describes how the high-availability Pacemaker/Corosync configuration was set up on two administrative servers, hypatia and orestes.

This may help as you work your way through the configuration:

# Concepts:
crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=192.168.85.3 \
   cidr_netmask=32 op monitor interval=30s

# Which is composed of
    * crm ::= "corosync resource manager", the command we're executing
    * primitive ::= The type of resource object that we’re creating.
    * IP ::= Our name for the resource
    * IPaddr2 ::= The script to call
    * ocf ::= The standard it conforms to
    * ip=192.168.85.3 ::= Parameter(s) as name/value pairs
    * cidr_netmask ::= netmask; 32-bits means use this exact IP address
    * op ::== what follows are options
    * monitor interval=30s ::= check every 30 seconds that this resource is working

# ... timeout = how to long wait before you assume a resource is dead. 

This work was done in Sep-2010. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.

# The beginning: make sure corosync is running on both hypatia and orestes:
/sbin/service corosync start

# The following line is needed because we have only two machines in 
# the HA cluster.

crm configure property no-quorum-policy=ignore

# We'll configure STONITH later (see below)

crm configure property stonith-enabled=false

# Define IP addresses to be managed by the HA systems.

crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 params ip=129.236.252.11 \
   cidr_netmask=32 op monitor interval=30s
crm configure primitive LocalIP ocf:heartbeat:IPaddr2 params ip=10.44.7.11 \
   cidr_netmask=32 op monitor interval=30s
crm configure primitive SandboxIP ocf:heartbeat:IPaddr2 params ip=10.43.7.11 \
   cidr_netmask=32 op monitor interval=30s
   
# Group these together, so they'll all be assigned to the same machine.
# The name of the group is "MainIPGroup".

crm configure group MainIPGroup ClusterIP LocalIP SandboxIP

# Let's continue by entering the crm utility for short sessions. I'm going to 
# test groups of commands before I commit them. (I omit the "configure show' 
# and "status" commands that I frequently typed in, in order to see that 
# everything was correct.)
crm
   # Define a "shadow" configuration, to test things without commiting them
   # to the HA cluster:
   cib new ip
   
   # Define the IPs associated with the backup system, and group them together.
   configure primitive AssistantIP ocf:heartbeat:IPaddr2 params ip=129.236.252.10 \
      cidr_netmask=32 op monitor interval=30s
   configure primitive AssistantLocalIP ocf:heartbeat:IPaddr2 params ip=10.44.7.10 
      cidr_netmask=32 op monitor interval=30s
   configure group AssistantIPGroup AssistantIP AssistantLocalIP
   
   # Define a "colocation" = how much do you want these things together?
   # A score of -1000 means to try to keep them on separate machines as
   # much as possible, but allow them on the same machine if necessary.
   
   configure colocation SeparateIPs -1000: MainIPGroup AssistantIPGroup
   
   # I like these commands, so commit them to the running configuration.
   
   cib commit ip
   quit
   
# DRBD is a service that syncronizes the hard drives between two machines.
# For our cluster, one machine will have access to the "master" copy
# and make all the changes to that copy; the other machine will have the
# "slave" copy and mindlessly duplicate all the changes.

crm
   cib new drbd
   
   # The "drbd_resource" parameter points to a configuration defined in /etc/drbd.d/
   
   configure primitive AdminDrbd ocf:linbit:drbd params drbd_resource=admin op monitor interval=60s
   
   # DRBD functions with a "master/slave" setup as described above. The following command
   # defines the name of the master disk partition ("Admin"). The remaining parameters
   # clarify that there are two copies, but only one can be the master, and
   # at most one can be a slave.
   
   configure master Admin AdminDrbd meta master-max=1 master-node-max=1 \
      clone-max=2 clone-node-max=1 notify=true globally-unique=false
      
   # The machine that gets the master copy (the one that will make changes to the drive)
   # should also be the one with the main IP address.
   
   configure colocation AdminWithMainIP inf: Admin:Master MainIPGroup
   
   cib commit drbd
   
   # Things look good, so let's add another disk resource. I defined another drbd resource
   # with some spare disk space, called "work". The idea is that I can play with alternate 
   # virtual machines and save them on "work" before I copy them to the more robust "admin".
   
   configure primitive WorkDrbd ocf:linbit:drbd params drbd_resource=work op monitor interval=60s
   configure master Work WorkDrbd meta master-max=1 master-node-max=1 \
      clone-max=2 clone-node-max=1 notify=true globally-unique=false
      
   # I prefer the work directory to be on the main admin box, but it doesn't have to be.
   
   configure colocation WorkPrefersMain 500: Work:Master MainIPGroup
      
   # Given a choice, try to put the Admin:Master on hypatia
   
   configure location DefinePreferredMainNode Admin 100: hypatia.nevis.columbia.edu

   cib commit drbd
   quit

# Now try a resource that depends on ordering: On the node that's has the master
# resource for "work," mount that disk image on as /work.
crm
   cib new workdisk
   
   # To find out that there was an "ocf:heartbeat:Filesystem" that I could use,
   # I used the command:
   ra classes
   
   # Based on the result, I looked at:
   
   ra list ocf heartbeat
   
   # To find out what Filesystem parameters I needed, I used:
   
   ra meta ocf:heartbeat:Filesystem
   
   # All of the above led me to create the following resource configuration:
   
   configure primitive WorkDirectory ocf:heartbeat:Filesystem \
      params device="/dev/drbd2" directory="/work" fstype="ext4"
      
   # Note that I had previously created an ext4 filesystem on /dev/drbd2.
   
   # Now specify that we want this to be on the same node as Work:Master:
   
   configure colocation DirectoryWithWork inf: WorkDirectory Work:Master
   
   # One more thing: It's important that we not try to mount the directory
   # until after Work has been promoted to master on the node.
   
   configure order WorkBeforeDirectory inf: Work:promote WorkDirectory:start
   
   cib commit workdisk
   quit

# We've made the relatively-unimportant work DRBD master function. Let's do it for real.
# Prevously I created some LVM volumes on the admin DRBD master. We need to use a 
# resource to active them, but we can't activate them until after the Admin:Master
# is loaded.
crm
   cib new lvm
   
   # Activate the LVM volumes, but only after DRBD has figured out where
   # Admin:Master is located.
   
   configure primitive Lvm ocf:heartbeat:LVM \
      params volgrpname="admin"
   configure colocation LvmWithAdmin inf: Lvm Admin:Master
   configure order AdminBeforeLvm inf: Admin:promote Lvm:start
   
   cib commit lvm
   
   # Go back to the actual, live configuration
   
   cib use live
   
   # See if everything is working
   
   configure show 
   status
   
   # Go back to the shadow for more commands.
   
   cib use lvm
   
   # We have a whole bunch of filesystems on the "admin" volume group. Let's
   # create the commands to mount them.
   
   # The 'timeout="240s' piece is to give a four-minute interval to start
   # up the mount. This allows for a "it's been too long, do an fsck" check
   # on mounting the filesystem. 
   
   # We also allow five minutes for the unmounting to stop, just in case 
   # it's taking a while for some job on server to let go of the mount.
   # It's better that it take a while to switch over the system service
   # than for the mount to be forcibly terminated.
   
   configure primitive UsrDirectory ocf:heartbeat:Filesystem \
      params device="/dev/admin/usr" directory="/usr/nevis" fstype="ext4" \
      op start interval="0" timeout="240s" \
      op stop interval="0" timeout="300s"
      
   configure primitive VarDirectory ocf:heartbeat:Filesystem \
      params device="/dev/admin/var" directory="/var/nevis" fstype="ext4" \
      op start interval="0" timeout="240s" \
      op stop interval="0" timeout="300s"
      
   configure primitive MailDirectory ocf:heartbeat:Filesystem \
      params device="/dev/admin/mail" directory="/mail" fstype="ext4" \
      op start interval="0" timeout="240s" \
      op stop interval="0" timeout="300s"
      
   configure primitive XenDirectory ocf:heartbeat:Filesystem \
      params device="/dev/admin/xen" directory="/xen" fstype="ext4" \
      op start interval="0" timeout="240s" \
      op stop interval="0" timeout="300s"
      
   configure group AdminDirectoriesGroup UsrDirectory VarDirectory MailDirectory XenDirectory
   
   # We can't mount any of them until LVM is set up:
   
   configure colocation DirectoriesWithLVM inf: AdminDirectoriesGroup Lvm
   configure order LvmBeforeDirectories inf: Lvm AdminDirectoriesGroup

   cib commit lvm
   quit

# Some standard Linux services are under corosync's control. They depend on some or
# all of the filesystems being mounted. 
   
# Let's start with a simple one: enable the printing service (cups):

crm
   cib new printing
   
   # lsb = "Linux Standard Base." It just means any service which is
   # controlled by the one of the standard scripts in /etc/init.d
   
   configure primitive Cups lsb:cups
   
   # The print server _must_ be associated with the main IP address.
   # A score of "inf" means "infinity"; if it can't be run on the
   # machine that's offering the main IP address, it won't run at all.
   
   configure colocation CupsWithMainIP inf: Cups MainIPGroup
   
   # But that's not the only requirement. Cups stores its spool files in
   # /var/spool/cups. If the cups service were to switch to a different server,
   # we want the new server to see the spools files. So create /var/nevis/cups,
   # link it with:
   #   mv /var/spool/cups /var/spool/cups.ori
   #   ln -sf /var/nevis/cups /var/spool/cups
   # and demand that the cups service only start if /var/nevis (and the other
   # high-availability directories) have been mounted.
   
   configure colocation CupsWithVar inf: Cups AdminDirectoriesGroup
   
   # In order to prevent chaos, make sure that the high-availability directories
   # have been mounted before we try to start cups.
   
   configure order VarBeforeCups inf: AdminDirectoriesGroup Cups
   
   cib commit printing
   quit

# The other services (xinetd, dhcpd) follow the same pattern as above:
# Make sure the services start on the same machine as the admin directories,
# and after the admin directories are successfully mounted.

crm
   cib new services
   
   configure primitive Xinetd lsb:xinetd
   configure primitive Dhcpd lsb:dhcpd
   
   configure colocation XinetdWithVar inf: Xinetd AdminDirectoriesGroup
   configure order VarBeforeXinetd inf: VarDirectory Xinetd
   
   configure colocation DhcpdWithVar inf: Dhcpd AdminDirectoriesGroup
   configure order VarBeforeDhcpd inf: VarDirectory Dhcpd
   
   cib commit services
   quit

# The high-availability servers export the /usr/nevis directory to all the
# other machines on the Nevis Linux cluster. NFS exporting of a shared
# directory can be a little tricky. As with CUPS spooling, we want to preserve
# the NFS export state in a way that the backup server can pick it up.
# The safest way to do this is to create a small separate LVM partition
# ("nfs") and mount it as "/var/lib/nfs".

crm
   cib new nfs
   
   # Define the mount for the NFS state directory /var/lib/nfs
   
   configure primitive NfsStateDirectory ocf:heartbeat:Filesystem \
         params device="/dev/admin/nfs" directory="/var/lib/nfs" fstype="ext4"
   configure colocation NfsStateWithVar inf: NfsStateDirectory AdminDirectoriesGroup
   configure order VarBeforeNfsState inf: AdminDirectoriesGroup NfsStateDirectory

   # Once that directory has been set up, we can start NFS.
   
   configure primitive Nfs lsb:nfs
   configure colocation NfsWithNfsState inf: Nfs NfsStateDirectory
   configure order NfsStateBeforeNfs inf: NfsStateDirectory Nfs
   
   cib commit nfs
   quit

   
# The whole point of this is to be able to run guest virtual machines under the
# control of the high-availability service. Here is the set-up for one example
# virtual machine. I previously created the hogwarts virtual machine and copied its
# configuration to /xen/configs/hogwarts.cfg.

crm
   cib new hogwarts
   
   # All the virtual machine files are stored in the /xen partition, which is one
   # of the high-availability admin directories. Make sure the directory is mounted
   # before starting the virtual machine.
   
   configure primitive Hogwarts ocf:heartbeat:Filesystem params xmfile="/xen/configs/Hogwarts.cfg"
   configure colocation HogwartsWithDirectories inf: Hogwarts AdminDirectoriesGroup
   configure order DirectoriesBeforeHogwarts inf: AdminDirectoriesGroup Hogwarts
   
   cib commit hogwarts
   quit


# An important part of a high-availability configuration is STONITH = "Shoot the
# other node in the head." Here's the idea: suppose one node fails for some reason. The
# other node will take over as needed. 

# Suppose the failed node tries to come up again. This can be a problem: The other node
# may have accumulated changes that the failed node doesn't know about. There can be
# synchronization issues that require manual intervention.

# The STONITH mechanism means: If a node fails, the remaining node(s) in a cluster will
# force a permanent shutdown of the failed node; it can't automatically come back up again.
# This also known as "fencing": once a node fails, it can't be allowed to re-join the
# cluster.

# In general, there are many ways to implement a STONITH mechanism. At Nevis, the way
# we do it is to shut-off the power on the UPS connected to the failed node.

# (By the way, this is why you have to restart hypatia and orestes at the same time.
# If you just restart one, the STONITH mechanism will cause the UPS on the restarting
# computer to turn off the power; it will never come back up.)

# At Nevis, the UPSes are monitored and controlled using the NUT package
# <http://www.networkupstools.org/>; details are on the Nevis wiki at
# <http://www.nevis.columbia.edu/twiki/bin/view/Nevis/Ups>.

# The official corosync distribution from <http://www.clusterlabs.org/> 
# does not include a script for NUT, so I had to write one. It's located at
# /home/bin/nut.sh on both hypatia and orestes; there are appropriate links
# to this script from the stonith/external directory. 

# By the way, I sent the script to Cluster Labs, who accepted it.
# The next generation of their distribution will include the script.

# The following commands implement the STONITH mechanism for our cluster:

crm
   cib new stonith
   
   # The STONITH resource that can potentially shut down hypatia.
   
   configure primitive HypatiaStonith stonith:external/nut \
      params hostname="hypatia.nevis.columbia.edu" \
      ups="hypatia-ups" username="admin" password="acdc"
      
   # The node that runs the above script cannot be hypatia; it's
   # not wise to trust a node to STONITH itself. Note that the score
   # is "negative infinity," which means "never run this resource
   # on the named node."

   configure location HypatiaStonithLoc HypatiaStonith -inf: hypatia.nevis.columbia.edu

   # The STONITH resource that can potentially shut down orestes.

   configure primitive OrestesStonith stonith:external/nut \
      params hostname="orestes.nevis.columbia.edu" \
      ups="orestes-ups" username="admin" password="acdc"

   # Again, orestes cannot be the node that runs the above script.
   
   configure location OresetesStonithLoc OrestesStonith -inf: orestes.nevis.columbia.edu
   
   cib commit stonith
   quit

# Now turn the STONITH mechanism on for the cluster.

crm configure property stonith-enabled=true


# At this point, the key elements of the high-availability configuration have
# been set up. There is one non-critical frill: One node (probably hypatia) will be 
# running the important services, while the other node (probably orestes) would
# be "twiddling its thumbs." Instead, let's have orestes do something useful: execute
# condor jobs.

# For orestes to do this, it requires the condor service. It also requires that
# library:/usr/nevis is mounted, the same as every other batch machine on the
# Nevis condor cluster. We can't use the automount daemon (amd) to do this for
# us, the way we do on the other batch nodes, so we have to make corosync do the
# mounts.

crm
   cib new condor
   
   # Mount library:/usr/nevis
      
   configure primitive LibraryOnWork ocf:heartbeat:Filesystem \
      params device="library:/usr/nevis" directory="/usr/nevis" \
      fstype="nfs" OCF_CHECK_LEVEL="20" 
      
   # Corosync must NOT mount library:/usr/nevis on the system has already 
   # mounted /usr/nevis directly as part of AdminDirectoriesGroup
   # described above. 
   
   # Note that if there's only one node remaining in the high-availability 
   # cluster, it will be running the resource AdminDirectoriesGroup, and 
   # LibraryOnWork will never be started. This is fine; if there's only one
   # node left, I _don't_ want it running condor jobs.
   
   configure colocation NoRemoteMountWithDirectories -inf: LibraryOnWork AdminDirectoriesGroup

   # Determine on which machine we mount library:/usr/nevis _after_ we
   # figure out which machine is running AdminDirectoriesGroup. "symmetrical=false" 
   # means that if we're turning off the resource for some reason, we don't
   # have to wait for LibraryOnWork to be stopped before we try to stop
   # AdminDirectoriesGroup (since these resources always run on different machines).
   
   configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork \
      symmetrical=false

   # The standard condor execution service. As with all the batch nodes,
   # I've already configured /etc/condor/condor_config.local and created
   # scratch directories in /data/condor.
      
   configure primitive Condor lsb:condor

   # If we're able mount library:/usr/nevis, then it's safe to start condor.
   # If we can't mount library:/usr/nevis, then condor will never be started.
   
   configure colocation CondorWithLibrary inf: Condor LibraryOnWork
   
   # library:/usr/nevis must be mounted before condor starts.
   
   configure order LibraryBeforeCondor inf: LibraryOnWork Condor
   
   cib commit condor
   quit
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback