Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 51 to 51 | ||||||||
| ||||||||
Changed: | ||||||||
< < | The price to be paid for all this sophistication is an increase in complexity. This page (and its companion page on the detailed Corosync configuration) are an attempt to explain the details. | |||||||
> > | The price to be paid for all this sophistication is an increase in complexity. This page (and its companion page on the detailed corosync configuration) are an attempt to explain the details. | |||||||
Configuration | ||||||||
Line: 109 to 109 | ||||||||
Most of the resources are controlled by scripts provided as part of the Pacemaker/Corosync package. The resources that begin with lsb:: (Linux standard base) are controlled by the standard scripts found in /etc/init.d/ on most Linux systems. | ||||||||
Changed: | ||||||||
< < | The entire configuration is spelled out in (excruciating?) detail on a separate Corosync configuration page. | |||||||
> > | The entire configuration is spelled out in (excruciating?) detail on a separate corosync configuration page. | |||||||
Services controlled by corosync: | ||||||||
Line: 183 to 183 | ||||||||
http://toic.org/2008/09/22/preventing-ip-conflicts-in-xen/![]()
| ||||||||
Added: | ||||||||
> > |
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 24 to 24 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
This configuration worked well for a few years, but some issues arose over time: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 107 to 107 | ||||||||
In HA terms, a "resource" means "anything you want to keep available all the time." What follows is an outline of the resources configured for our HA cluster. In this outline, an indent means that the resource depends on one above it; for example, the mail-server virtual machine won't start if NFS is not available; NFS won't start if /var/lib/nfs is not available. | ||||||||
Added: | ||||||||
> > | Most of the resources are controlled by scripts provided as part of the Pacemaker/Corosync package. The resources that begin with lsb:: (Linux standard base) are controlled by the standard scripts found in /etc/init.d/ on most Linux systems. | |||||||
The entire configuration is spelled out in (excruciating?) detail on a separate Corosync configuration page. | ||||||||
Line: 177 to 179 | ||||||||
http://virt-manager.et.redhat.com/download.html![]() http://wiki.xensource.com/xenwiki/XenNetworking ![]() | ||||||||
Changed: | ||||||||
< < | http://toic.org/2008/10/06/multiple-network-interfaces-in-xen/![]() | |||||||
> > | http://toic.org/2008/10/06/multiple-network-interfaces-in-xen/![]() http://toic.org/2008/09/22/preventing-ip-conflicts-in-xen/ ![]() | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 23 to 23 | ||||||||
In the 2000's, nevis1 was gradually replaced by many Linux boxes. Administrative services were moved to separate systems, typically one service per box; e.g., there was a mail server, a DNS![]() ![]()
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Line: 65 to 65 | ||||||||
These services are not part of the HA cluster because:
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
The HA cluster | ||||||||
Line: 164 to 164 | ||||||||
http://www.clusterlabs.org/wiki/Main_Page![]() http://www.clusterlabs.org/rpm/ ![]() http://theclusterguy.clusterlabs.org/post/178680309/configuring-heartbeat-v1-was-so-simple ![]() | ||||||||
Changed: | ||||||||
< < | http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/![]() | |||||||
> > | http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/![]() http://www.ourobengr.com/ha ![]() | |||||||
DRBD |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 28 to 28 | ||||||||
This configuration worked well for a few years, but some issues arose over time: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Line: 105 to 105 | ||||||||
Resource configuration | ||||||||
Changed: | ||||||||
< < | In HA terms, a "resource" means "anything you want to keep available all the time." What follows is an outline of the resources configured for our HA cluster. In this outline, an indent means that the resource depends on one above it; for example, the mail-server virtual machine won't start if NFS is not available; NFS won't start if /var/lib//nfs is not available. | |||||||
> > | In HA terms, a "resource" means "anything you want to keep available all the time." What follows is an outline of the resources configured for our HA cluster. In this outline, an indent means that the resource depends on one above it; for example, the mail-server virtual machine won't start if NFS is not available; NFS won't start if /var/lib/nfs is not available. | |||||||
The entire configuration is spelled out in (excruciating?) detail on a separate Corosync configuration page. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 36 to 36 | ||||||||
Tools | ||||||||
Changed: | ||||||||
< < | Over the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster is Corosync with Pacemaker![]() | |||||||
> > | Over the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster are Corosync with Pacemaker![]() | |||||||
Another open-source development was virtual machines for Linux. If you've ever used VMware![]() ![]() | ||||||||
Line: 46 to 46 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 49 to 49 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
The price to be paid for all this sophistication is an increase in complexity. This page (and its companion page on the detailed Corosync configuration) are an attempt to explain the details. | ||||||||
Line: 69 to 69 | ||||||||
The HA cluster | ||||||||
Changed: | ||||||||
< < | The two high-availability servers are hypatia and orestes . | |||||||
> > | The two high-availability servers are hypatia and orestes . For the sake of simplicity, hypatia is normally the "main" server and orestes the backup server. | |||||||
Changed: | ||||||||
< < |
| |||||||
> > | Disk configurationA sketch of the disk organization of the high-availability servers:![]() | |||||||
Changed: | ||||||||
< < | ![]() | |||||||
> > | Text description:
Network configurationBothhypatia and orestes have two Ethernet![]()
hypatia and orestes , but this would not be useful if the HA services were moved from one system to the other. Among the HA resources (see below) that are managed by the systems are "generic" IP addresses assigned to the cluster. The IP name hamilton.nevis.columbia.edu always points to the system that offering the important cluster resources; the name burr.nevis.columbia.edu always points to the system offering "scratch" resources. Of course, if one of these systems goes down, then these two aliases will point to the same box.
In general, this means that if you need to access the system offering the main cluster resources, always use the name hamilton .
Resource configurationIn HA terms, a "resource" means "anything you want to keep available all the time." What follows is an outline of the resources configured for our HA cluster. In this outline, an indent means that the resource depends on one above it; for example, the mail-server virtual machine won't start if NFS is not available; NFS won't start if/var/lib//nfs is not available.
The entire configuration is spelled out in (excruciating?) detail on a separate Corosync configuration page.
Services controlled by corosync: main node: Admin:Master = the DRBD "admin" partition's main image (Constraint: +100 to be on hypatia) MainIPGroup: IP = 129.236.252.11 (hamilton = library = time = print) IP = 10.44.7.11 IP = 10.43.7.11 LVM = makes the following logical volumes on the admin partition visible Filesystem: /usr/nevis Filesystem: /mail Filesystem: /var/nevis Filesystem: /var/lib/nfs lsb::cups lsb::xinetd (includes tftp and ftp) lsb::dhcp (ln -sf /var/nevis/dhcpd /var/lib/dhcpd) lsb::nfs Xen virtual machines: sullivan (mailing list) tango (Samba) ada (= www; web server) franklin (= mail; mail server) hogwarts (= staff accounts for non-login users) Work:Master = the DRBD "work" partition's main image Filesystem: /work assistant node: AssistantIPGroup (Constraint: -1000 to be on same system as hamilton) IP = 129.236.252.10 (burr = assistant) IP = 10.44.7.10 mount library:/usr/nevis lsb::condor (Constraint: -INF for AdminDirectoriesGroup; if everything is running on one box, stop running condor) On both systems: the STONITH resources. ReferencesThese are the web sites I used to develop the HA cluster configuration at Nevis.Corosync/Pacemakerhttp://www.clusterlabs.org/wiki/Main_Page![]() http://www.clusterlabs.org/rpm/ ![]() http://theclusterguy.clusterlabs.org/post/178680309/configuring-heartbeat-v1-was-so-simple ![]() http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ ![]() DRBDhttp://www.drbd.org/home/feature-list/![]() http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 ![]() http://howtoforge.com/highly-available-nfs-server-using-drbd-and-heartbeat-on-debian-5.0-lenny ![]() Xen virtual machineshttp://virt-manager.et.redhat.com/download.html![]() http://wiki.xensource.com/xenwiki/XenNetworking ![]() http://toic.org/2008/10/06/multiple-network-interfaces-in-xen/ ![]() | |||||||
Changed: | ||||||||
< < |
| |||||||
> > |
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Nevis particle-physics administrative cluster | ||||||||
Line: 32 to 32 | ||||||||
| ||||||||
Changed: | ||||||||
< < | Consolidating systems | |||||||
> > | High-availabilityTools | |||||||
Over the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster is Corosync with Pacemaker![]() | ||||||||
Line: 50 to 52 | ||||||||
| ||||||||
Added: | ||||||||
> > |
ConfigurationThe non-HA serverFirst, let's go over an administrative server that is not part of the HA cluster:hermes . This server provides the following functions:
These services are not part of the HA cluster because:
The HA clusterThe two high-availability servers arehypatia and orestes .
![]()
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > |
Nevis particle-physics administrative cluster![]() BackgroundA single systemIn the 1990s, Nevis computing centered on a single computer,nevis1 . The majority of the users used this machine to analyze data, access their e-mail, set up web sites, etc. Although the system (an SGI Challenge XL) was relatively powerful for its time, this organization had some disadvantages:
A distributed clusterIn the 2000's,nevis1 was gradually replaced by many Linux boxes. Administrative services were moved to separate systems, typically one service per box; e.g., there was a mail server, a DNS![]() ![]()
Consolidating systemsOver the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster is Corosync with Pacemaker![]() ![]() ![]() ![]() ![]()
|