Nevis particle-physics administrative cluster

This is a description of the organization of the administrative computers on the Nevis Linux cluster. The emphasis is on describing the high-availability cluster, because they are relatively new to the world of physics.

Background

A single system

In the 1990s, Nevis computing centered on a single computer, nevis1. The majority of the users used this machine to analyze data, access their e-mail, set up web sites, etc. Although the system (an SGI Challenge XL) was relatively powerful for its time, this organization had some disadvantages:

  • All the users had to share the processing queues. It was possible that one user could dominate the computer, preventing anyone else from running their own analysis jobs.
  • If it became necessary to restart the computer, it had to be scheduled in advance (typically two weeks), since such a restart would affect almost everyone at Nevis.
  • If the system's security became compromised, it affected everyone and everything on the system.

A distributed cluster

In the 2000's, nevis1 was gradually replaced by many Linux boxes. Administrative services were moved to separate systems, typically one service per box; e.g., there was a mail server, a DNS server, a Samba server, etc. Each working group at Nevis purchased their own server and managed their own disk space. The above issues were resolved:

  • Jobs could be sent to the condor batch system, so that no one system would be slowed down due to a user's jobs.
  • A single computer system could be restarted without affected most of the rest of the cluster; e.g., if the mail server needed to be rebooted, it didn't affect the physics analysis.
  • If a server because compromised, e.g., the web server, the effects could be restricted to that server.

This configuration worked well for a few years, but some issues arose over time:

  • Each working group would purchase and maintain their server with their own funds. However, the administrative servers had to purchased with Nevis' infrastructure funds. That meant that the administrative servers would be replaced rarely, if at all.
  • As a result, the administrative servers tended to be older, recycled, or inexpensive systems, with a correspondingly higher risk of failure.
  • At one point there were seven administrative servers in our computer area, each one requiring an uninterruptible power supply, and each one contributing to the power and heat load of the computer room.

High-availability

Tools

Over the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster is Corosync with Pacemaker.

Another open-source development was virtual machines for Linux. If you've ever used VMware, you're already familiar with the concept. The software (actually, a kernel extension) used to implement virtual machines in Linux is called Xen.

The final "piece of the puzzle" is a software package plus kernel extension called DRBD. The simplest way to understand DRBD is to think of it as RAID1 between two computers: when one computer makes a change to a disk partition, the other computer makes the identical change to its copy of the partition. In this way, the disk partition is mirrored between the two systems.

These tools can be used to solve the issues listed above:

  • Six different computers, all old, could be replaced by two new systems.
  • The systems could be new, but relatively inexpensive. If one of them failed, the other one would automatically take up the services that the first computer offered.
  • The disk images of the two computers, including the virtual machines, could be kept automatically synchronized (as opposed to being copied via script run at regular intervals).
  • Virtual machines are essentially large (~10GB) disk files. They can be manipulated as if they were separate computers (e.g., rebooted when needed) but can also be copied like disk files:
    • If a virtual server had its security broken, it could be quickly replaced by a older, un-hacked copy of that virtual machine.
    • If a virtual server required a complicated, time-consuming upgrade, a copy could be upgraded instead, then quickly swapped with a minimal interruption.

The price to be paid for all this sophistication is an increase in complexity. This page (and its companion page on the detailed Corosync configuration) are an attempt to explain the details.

Configuration

The non-HA server

First, let's go over an administrative server that is not part of the HA cluster: hermes. This server provides the following functions:

These services are not part of the HA cluster because:

  • in the hopefully unlikely event that the HA cluster needs to be rebooted, it's nice to have the services on hermes still available during the reboot;
  • it takes a bit of time for the HA cluster to start up after, e.g., a power outage; again, it's nice to have the above services immediately available.

The HA cluster

The two high-availability servers are hypatia and orestes.

  • A sketch of the disk organization of the high-availability servers.:

HAServersSketch.jpg
Edit | Attach | Watch | Print version | History: r9 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2010-09-29 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback