Difference: PacemakerDualPrimaryConfiguration (4 vs. 5)

Revision 52013-01-04 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis particle-physics administrative cluster configuration

Line: 21 to 21
 /home/bin/recover-symlinks.sh
/etc/rc.d/rc.fix-pacemaker-delay (on hypatia only)
Changed:
<
<
The links are to an external site, pastebin; I use this in case I want to consult with someone on the HA setup. If you're reading this from a hardcopy, you can find all these files by visiting http://www.pastbin.com and searching for wgseligman 20130103.
>
>
The links are to an external site, pastebin; I use this in case I want to consult with someone on the HA setup. If you're reading this from a hardcopy, you can find all these files by visiting http://pastebin.com/u/wgseligman and searching for 20130103.
 

One-time set-up

Line: 72 to 72
 

Clustered LVM setup

Changed:
<
<
The following commands only have to be issued on one of the nodes.
>
>
Most of the following commands only have to be issued on one of the nodes. See Clusters From Scratch and Redhat Cluster Tutorial for details.
 
  • Edit /etc/lvm/lvm.conf on both systems; search this file for the initials WGS for a complete list of changes.
    • Change the filter line to search for DRBD partitions:
      filter = [ "a|/dev/drbd.*|", "a|/dev/md1|", "r|.*|" ]
Line: 94 to 94
 
  • Reboot both nodes.
Changed:
<
<

Commands

>
>

Pacemaker configuration

Commands

  The configuration has definitely changed from that listed below. To see the current configuration, run this as root on either hypatia or orestes:
crm configure show
Added:
>
>
To see the status of all the resources:
crm resource status
 To get a constantly-updated display of the resource status, the following command is the corosync equivalent of "top" (use Ctrl-C to exit):
crm_mon
Line: 110 to 116
 sudo crm_mon
Changed:
<
<

Concepts

>
>

Concepts

  This may help as you work your way through the configuration:
Changed:
<
<
crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=192.168.85.3 \
>
>
crm configure primitive MyIPResource ocf:heartbeat:IPaddr2 params ip=192.168.85.3 \
  cidr_netmask=32 op monitor interval=30s

# Which is composed of * crm ::= "cluster resource manager", the command we're executing * primitive ::= The type of resource object that we’re creating.

Changed:
<
<
* IP ::= Our name for the resource
>
>
* MyIPResource ::= Our name for the resource
  * IPaddr2 ::= The script to call * ocf ::= The standard it conforms to * ip=192.168.85.3 ::= Parameter(s) as name/value pairs
Line: 144 to 150
 crm ra meta ocf:heartbeat:IPaddr2
Changed:
<
<

Configuration

>
>

Initial configuration guide

 
Changed:
<
<
This work was done in Sep-2010, with major revisions for stability in Aug-2011. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.
>
>
This work was done in Apr-2012. The configuration has almost certainly changed since then. Hopefully, the following commands and comments will guide you to understanding any future changes and the reasons for them.
 
# The commands ultimately used to configure the high-availability (HA) servers:
Changed:
<
<
# The beginning: make sure corosync is running on both hypatia and orestes:

/sbin/service corosync start

# The following line is needed because we have only two machines in # the HA cluster.

>
>
# The beginning: make sure pacemaker is running on both hypatia and orestes:
 
Changed:
<
<
crm configure property no-quorum-policy=ignore
>
>
/sbin/service pacemaker status crm node status crm resource status
  # We'll configure STONITH later (see below)

crm configure property stonith-enabled=false

Deleted:
<
<
# Define IP addresses to be managed by the HA systems.

crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 params ip=129.236.252.11 cidr_netmask=32 op monitor interval=30s crm configure primitive LocalIP ocf:heartbeat:IPaddr2 params ip=10.44.7.11 cidr_netmask=32 op monitor interval=30s crm configure primitive SandboxIP ocf:heartbeat:IPaddr2 params ip=10.43.7.11 cidr_netmask=32 op monitor interval=30s

# Group these together, so they'll all be assigned to the same machine. # The name of the group is "MainIPGroup".

crm configure group MainIPGroup ClusterIP LocalIP SandboxIP

 # Let's continue by entering the crm utility for short sessions. I'm going to
Changed:
<
<
# test groups of commands before I commit them. (I omit the "configure show' # and "status" commands that I frequently typed in, in order to see that # everything was correct.)
>
>
# test groups of commands before I commit them. I omit the "crm configure show' # and "crm status" commands that I frequently typed in, in order to see that # everything was correct.

# I also omit the standard resource options # (e.g., "... op monitor interval="20" timeout="40" depth="0"...) to make the # commands look simpler. This particular option means to check that the # resource is running every 20 seconds, and to declare that the monitor operation # will generate an error if 40 seconds elapse without a response. You can see the # complete list with "crm configure show".

  # DRBD is a service that synchronizes the hard drives between two machines.
Changed:
<
<
# For our cluster, one machine will have access to the "master" copy # and make all the changes to that copy; the other machine will have the # "slave" copy and mindlessly duplicate all the changes.

# I previously configured the DRBD resources 'admin' and 'work'. What the # following commands do is put the maintenance of these resources under # the control of Pacemaker.

>
>
# When one machine makes any change to the DRBD disk, the other machine # immediately duplicates that change on the block level. We have a dual-primary # configuration, which means both machines can mount the DRBD disk at once.

# Start by entering the resource manager.

  crm
Added:
>
>
  # Define a "shadow" configuration, to test things without committing them # to the HA cluster: cib new drbd
Line: 197 to 192
  # to the HA cluster: cib new drbd

Changed:
<
<
# The "drbd_resource" parameter points to a configuration defined in /etc/drbd.d/

configure primitive AdminDrbd ocf:linbit:drbd params drbd_resource=admin op monitor interval=60s

# DRBD functions with a "master/slave" setup as described above. The following command # defines the name of the master disk partition ("Admin"). The remaining parameters # clarify that there are two copies, but only one can be the master, and # at most one can be a slave.

configure master Admin AdminDrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false

# The machine that gets the master copy (the one that will make changes to the drive) # should also be the one with the main IP address.

>
>
# The "drbd_resource" parameter points to a configuration defined in /etc/drbd.d/admin.res
 
Changed:
<
<
configure colocation AdminWithMainIP inf: MainIPGroup Admin:Master
>
>
primitive AdminDrbd ocf:linbit:drbd params drbd_resource="admin" meta target-role="Master"

# The following resources defines how the DRBD resource (AdminDrbd) is to # duplicated ("cloned") among the nodes. The parameters clarify that there are # two copies, one on each node, and both can be the master.

ms AdminClone AdminDrbd meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" interleave="true"

 
Changed:
<
<
# We want to wait before assigning IPs to a node until we know that # Admin has been promoted to master on that node. configure order AdminBeforeMainIP inf: Admin:promote MainIPGroup

# I like these commands, so commit them to the running configuration.

cib commit drbd

# Things look good, so let's add another disk resource. I defined another drbd resource # with some spare disk space, called "work". The idea is that I can play with alternate # virtual machines and save them on "work" before I copy them to the more robust "admin".

configure primitive WorkDrbd ocf:linbit:drbd params drbd_resource=work op monitor interval=60s configure master Work WorkDrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false

# I prefer the work directory to be on the main admin box, but it doesn't have to be. "500:" is # weighting factor; compare it to "inf:" (for infinity) which is used in most of these commands.

configure colocation WorkPrefersMain 500: Work:Master MainIPGroup

# Given a choice, try to put the Admin:Master on hypatia

configure location DefinePreferredMainNode Admin 100: hypatia.nevis.columbia.edu

>
>
configure show
 
Added:
>
>
# Looks good, so commit the change.
  cib commit drbd quit
Changed:
<
<
# Now try a resource that depends on ordering: On the node that has the master # resource for "work," mount that disk image as /work.
>
>
# Now define resources that depend on ordering.
 crm
Changed:
<
<
cib new workdisk
>
>
cib new disk

# The DRBD is available to the system. The next step is to tell LVM # that the volume group ADMIN exists on the disk.

 
Changed:
<
<
# To find out that there was an "ocf:heartbeat:Filesystem" that I could use,
>
>
# To find out that there was a resource "ocf:heartbeat:LVM" that I could use,
  # I used the command: ra classes

Line: 255 to 227
  ra list ocf heartbeat

Changed:
<
<
# To find out what Filesystem parameters I needed, I used:
>
>
# To find out what LVM parameters I needed, I used:
 
Changed:
<
<
ra meta ocf:heartbeat:Filesystem
>
>
ra meta ocf:heartbeat:LVM
  # All of the above led me to create the following resource configuration:

Changed:
<
<
configure primitive WorkDirectory ocf:heartbeat:Filesystem params device="/dev/drbd2" directory="/work" fstype="ext4"
>
>
primitive AdminLvm ocf:heartbeat:LVM params volgrpname="ADMIN"
 
Changed:
<
<
# Note that I had previously created an ext4 filesystem on /dev/drbd2.
>
>
# After I set up the volume group, I want to mount the logical volumes # (partitions) within the volume group. Here's one of the partitions, /usr/nevis; # note that I begin all the filesystem resources with FS so they'll be next # to each other when I type "crm configure show".
 
Changed:
<
<
# Now specify that we want this to be on the same node as Work:Master:
>
>
primitive FSUsrNevis ocf:heartbeat:Filesystem params device="/dev/mapper/ADMIN-usr" directory="/usr/nevis" fstype="gfs2" options="defaults,noatime,nodiratime"
 
Changed:
<
<
configure colocation DirectoryWithWork inf: WorkDirectory Work:Master
>
>
# I have similar definitions for the other logical volumes in volume group ADMIN: # /mail, /var/nevis, etc.
 
Changed:
<
<
# One more thing: It's important that we not try to mount the directory # until after Work has been promoted to master on the node.
>
>
# Now I'm going to define a resource group. The following command means: # - Put all these resources on the same node; # - Start these resources in the order they're listed; # - The resources depend on each other in the order they're listed. For example, # if AdminLvm fails, FSUsrNevis will not start, or will be stopped if it's running.
 
Changed:
<
<
# A score of "inf" means "infinity"; if the DRBD resource 'work' can't # be set up, then don't mount the /work partition.
>
>
group FilesystemGroup AdminLvm FSUsrNevis FSVarNevis FSVirtualMachines FSMail FSWork
 
Changed:
<
<
configure order WorkBeforeDirectory inf: Work:promote WorkDirectory:start
>
>
# We want these logical volumes (or partitions or filesystems) to be available # on both nodes. To do this, we define a clone resource.
 
Changed:
<
<
cib commit workdisk quit

# We've made the relatively-unimportant DRBD resource 'work' function. Let's do it for 'admin'. # Previously I created some LVM volumes on the admin DRBD master. We need to use a # resource to active them, but we can't activate them until after the Admin:Master # is loaded. crm cib new lvm

>
>
clone FilesystemClone FilesystemGroup meta interleave="true"
 
Changed:
<
<
# Activate the LVM volumes, but only after DRBD has figured out where # Admin:Master is located.
>
>
# One more thing: It's important that we not try to set up the filesystems # until the DRBD admin resource is running on a node, and has been # promoted to master.
 
Changed:
<
<
configure primitive Lvm ocf:heartbeat:LVM params volgrpname="admin" configure colocation LvmWithAdmin inf: Lvm Admin:Master configure order AdminBeforeLvm inf: Admin:promote Lvm:start
>
>
# A score of "inf" means "infinity"; if the DRBD resource 'AdminClone' can't # be promoted, then don't start the 'FilesystemClone' resource.
 
Changed:
<
<
cib commit lvm
>
>
colocation Filesystem_With_Admin inf: FilesystemClone AdminClone:Master order Admin_Before_Filesystem inf: AdminClone:promote FilesystemClone:start
 
Changed:
<
<
# Go back to the actual, live configuration

cib use live

# See if everything is working

configure show status

# Go back to the shadow for more commands.

cib use lvm

# We have a whole bunch of filesystems on the "admin" volume group. Let's # create the commands to mount them.

# The 'timeout="240s' piece is to give a four-minute interval to start # up the mount. This allows for a "it's been too long, do an fsck" check # on mounting the filesystem.

# We also allow five minutes for the unmounting to stop, just in case # it's taking a while for some job on server to let go of the mount. # It's better that it take a while to switch over the system service # than for the mount to be forcibly terminated.

configure primitive UsrDirectory ocf:heartbeat:Filesystem params device="/dev/admin/usr" directory="/usr/nevis" fstype="ext4" op start interval="0" timeout="240s" op stop interval="0" timeout="300s"

configure primitive VarDirectory ocf:heartbeat:Filesystem params device="/dev/admin/var" directory="/var/nevis" fstype="ext4" op start interval="0" timeout="240s" op stop interval="0" timeout="300s"

configure primitive MailDirectory ocf:heartbeat:Filesystem params device="/dev/admin/mail" directory="/mail" fstype="ext4" op start interval="0" timeout="240s" op stop interval="0" timeout="300s"

configure primitive XenDirectory ocf:heartbeat:Filesystem params device="/dev/admin/xen" directory="/xen" fstype="ext4" op start interval="0" timeout="240s" op stop interval="0" timeout="300s"

configure group AdminDirectoriesGroup UsrDirectory VarDirectory MailDirectory XenDirectory

# We can't mount any of them until LVM is set up:

configure colocation DirectoriesWithLVM inf: AdminDirectoriesGroup Lvm configure order LvmBeforeDirectories inf: Lvm AdminDirectoriesGroup

cib commit lvm

>
>
cib commit disk
  quit

# Some standard Linux services are under corosync's control. They depend on some or

Line: 511 to 432
 # /home/bin/nut.sh on both hypatia and orestes; there are appropriate links # to this script from the stonith/external directory.
Deleted:
<
<
# By the way, I sent the script to Cluster Labs, who accepted it. # The next generation of their distribution will include the script.
 # The following commands implement the STONITH mechanism for our cluster:

crm

Line: 549 to 467
  crm configure property stonith-enabled=true
Deleted:
<
<
# At this point, the key elements of the high-availability configuration have # been set up. There is one non-critical frill: One node (probably hypatia) will be # running the important services, while the other node (probably orestes) would # be "twiddling its thumbs." Instead, let's have orestes do something useful: execute # condor jobs.

# For orestes to do this, it requires the condor service. It also requires that # library:/usr/nevis is mounted, the same as every other batch machine on the # Nevis condor cluster. We can't use the automount daemon (amd) to do this for # us, the way we do on the other batch nodes; we have to make corosync do the # mounts.

crm cib new condor

# Mount library:/usr/nevis. A bit of a name confusion here: there's a /work # partition on the primary node, but the name 'LibraryOnWork" means that # the nfs-mount of /usr/nevis is located on the secondary or "work" node.

configure primitive LibraryOnWork ocf:heartbeat:Filesystem params device="library:/usr/nevis" directory="/usr/nevis" fstype="nfs"

# Corosync must not NFS-mount library:/usr/nevis on the system has already # mounted /usr/nevis directly as part of AdminDirectoriesGroup # described above.

# Note that if there's only one node remaining in the high-availability # cluster, it will be running the resource AdminDirectoriesGroup, and # LibraryOnWork will never be started. This is fine; if there's only one # node left, I don't want it running condor jobs.

configure colocation NoRemoteMountWithDirectories -inf: LibraryOnWork AdminDirectoriesGroup

# Determine on which machine we mount library:/usr/nevis after the NFS # export of /usr/nevis has been set up.

configure order NfsBeforeLibrary inf: Nfs LibraryOnWork

# Define the IPs associated with the backup system, and group them together. # This is a non-critical definition, and I don't want to assign it until the more important # "secondary" resources have been set up.

configure primitive Burr ocf:heartbeat:IPaddr2 params ip=129.236.252.10 cidr_netmask=32 op monitor interval=30s configure primitive BurrLocal ocf:heartbeat:IPaddr2 params ip=10.44.7.10 cidr_netmask=32 op monitor interval=30s configure group AssistantIPGroup Burr BurrLocal

colocation AssistantWithLibrary inf: AssistantIPGroup LibraryOnWork order LibraryBeforeAssistant inf: LibraryOnWork AssistantIPGroup

# The standard condor execution service. As with all the batch nodes, # I've already configured /etc/condor/condor_config.local and created # scratch directories in /data/condor.

configure primitive Condor lsb:condor

# If we're able mount library:/usr/nevis, then it's safe to start condor. # If we can't mount library:/usr/nevis, then condor will never be started. # (We stated above that AssistantIPGroup won't start until after LibraryOnWork).

configure colocation CondorWithAssistant inf: Condor AssistantIPGroup configure order AssistantBeforeCondor inf: AssistantIPGroup Condor

cib commit condor quit

 

META TOPICMOVED by="WilliamSeligman" date="1348092384" from="Nevis.CorosyncDualPrimaryConfiguration" to="Nevis.PacemakerDualPrimaryConfiguration"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback