Difference: CorosyncSinglePrimaryConfiguration (2 vs. 3)

Revision 32010-10-12 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 22 to 22
 
crm configure show
Changed:
<
<
To get a constantly-updated display of the configuration, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
>
>
To get a constantly-updated display of the resource status, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
 
crm_mon
Line: 57 to 57
 # ... timeout = how to long wait before you assume a resource is dead.
Added:
>
>
How to find out which scripts exist, that is, which resources can be controlled by the HA cluster:
ra classes
Based on the result, I looked at:
ra list ocf heartbeat
To find out what IPaddr2 parameters I needed, I used:
ra meta ocf:heartbeat:IPaddr2
 

Configuration

This work was done in Sep-2010. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.

Added:
>
>
# The commands ultimately used to configure the high-availability (HA) servers:
 # The beginning: make sure corosync is running on both hypatia and orestes:

/sbin/service corosync start

Line: 116 to 131
  cib commit ip quit

Changed:
<
<
# DRBD is a service that syncronizes the hard drives between two machines.
>
>
# DRBD is a service that synchronizes the hard drives between two machines.
 # For our cluster, one machine will have access to the "master" copy # and make all the changes to that copy; the other machine will have the # "slave" copy and mindlessly duplicate all the changes.
Line: 162 to 177
  cib commit drbd quit
Changed:
<
<
# Now try a resource that depends on ordering: On the node that's has the master # resource for "work," mount that disk image on as /work.
>
>
# Now try a resource that depends on ordering: On the node that has the master # resource for "work," mount that disk image as /work.
 crm cib new workdisk

Line: 283 to 298
  configure primitive Cups lsb:cups

Changed:
<
<
# The print server must be associated with the main IP address. # A score of "inf" means "infinity"; if it can't be run on the # machine that's offering the main IP address, it won't run at all.

configure colocation CupsWithMainIP inf: Cups MainIPGroup

# But that's not the only requirement. Cups stores its spool files in # /var/spool/cups. If the cups service were to switch to a different server, # we want the new server to see the spools files. So create /var/nevis/cups, # link it with:

>
>
# Cups stores its spool files in /var/spool/cups. If the cups service # were to switch to a different server, we want the new server to see # the spooled files. So create /var/nevis/cups, link it with:
  # mv /var/spool/cups /var/spool/cups.ori # ln -sf /var/nevis/cups /var/spool/cups # and demand that the cups service only start if /var/nevis (and the other # high-availability directories) have been mounted.

Added:
>
>
# A score of "inf" means "infinity"; if it can't be run on the # machine that mounted all the admin directories, it won't run at all.

  configure colocation CupsWithVar inf: Cups AdminDirectoriesGroup

# In order to prevent chaos, make sure that the high-availability directories

Line: 327 to 338
  cib commit services quit
Changed:
<
<
# The high-availability servers export the /usr/nevis directory to all the # other machines on the Nevis Linux cluster. NFS exporting of a shared # directory can be a little tricky. As with CUPS spooling, we want to preserve # the NFS export state in a way that the backup server can pick it up. # The safest way to do this is to create a small separate LVM partition # ("nfs") and mount it as "/var/lib/nfs".
>
>
# The high-availability servers export some of the admin directories to other # systems, both real and virtual; for example, the /usr/nevis directory is # exported to all the other machines on the Nevis Linux cluster.

# NFS exporting of a shared directory can be a little tricky. As with CUPS # spooling, we want to preserve the NFS export state in a way that the # backup server can pick it up. The safest way to do this is to create a # small separate LVM partition ("nfs") and mount it as "/var/lib/nfs", # the NFS directory that contains files that keep track of the NFS state.

  crm cib new nfs
Line: 354 to 368
  quit

Changed:
<
<
# The whole point of this is to be able to run guest virtual machines under the # control of the high-availability service. Here is the set-up for one example
>
>
# The whole point of the entire setup is to be able to run guest virtual machines # under the control of the high-availability service. Here is the set-up for one example
 # virtual machine. I previously created the hogwarts virtual machine and copied its # configuration to /xen/configs/hogwarts.cfg.

crm cib new hogwarts

Added:
>
>
# Give the virtual machine a long stop interval before flagging an error. # Sometimes it takes a while for Linux to shut down.

configure primitive Hogwarts ocf:heartbeat:Filesystem params xmfile="/xen/configs/Hogwarts.cfg" op stop interval="0" timeout="240"

  # All the virtual machine files are stored in the /xen partition, which is one
Changed:
<
<
# of the high-availability admin directories. Make sure the directory is mounted # before starting the virtual machine.
>
>
# of the high-availability admin directories. The virtual machine must run on # the system with this directory.
 
Deleted:
<
<
configure primitive Hogwarts ocf:heartbeat:Filesystem params xmfile="/xen/configs/Hogwarts.cfg"
  configure colocation HogwartsWithDirectories inf: Hogwarts AdminDirectoriesGroup
Changed:
<
<
configure order DirectoriesBeforeHogwarts inf: AdminDirectoriesGroup Hogwarts
>
>
# All of the virtual machines depend on NFS-mounting directories which # are exported by the HA server. The safest thing to do is to make sure # NFS is running on the HA server before starting the virtual machine.

configure order NfsBeforeHogwarts inf: Nfs Hogwarts

  cib commit hogwarts quit
Line: 384 to 409
  # The STONITH mechanism means: If a node fails, the remaining node(s) in a cluster will # force a permanent shutdown of the failed node; it can't automatically come back up again.
Changed:
<
<
# This also known as "fencing": once a node fails, it can't be allowed to re-join the # cluster.
>
>
# This is a special case of "fencing": once a node or resource fails, it can't be allowed # to start up again automatically.
  # In general, there are many ways to implement a STONITH mechanism. At Nevis, the way # we do it is to shut-off the power on the UPS connected to the failed node.
Line: 475 to 500
  configure colocation NoRemoteMountWithDirectories -inf: LibraryOnWork AdminDirectoriesGroup

# Determine on which machine we mount library:/usr/nevis after we

Changed:
<
<
# figure out which machine is running AdminDirectoriesGroup. "symmetrical=false" # means that if we're turning off the resource for some reason, we don't # have to wait for LibraryOnWork to be stopped before we try to stop # AdminDirectoriesGroup (since these resources always run on different machines).
>
>
# figure out which machine is running AdminDirectoriesGroup.
 
Changed:
<
<
configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork symmetrical=false
>
>
configure order DirectoresBeforeLibrary inf: AdminDirectoriesGroup LibraryOnWork
  # The standard condor execution service. As with all the batch nodes, # I've already configured /etc/condor/condor_config.local and created
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback