Nevis particle-physics administrative cluster
This is a description of the organization of the administrative computers on the Nevis
Linux cluster. The emphasis is on describing the
high-availability
cluster, because they are relatively new to the world of physics.
Background
A single system
In the 1990s, Nevis computing centered on a single computer,
nevis1
. The majority of the users used this machine to analyze data, access their e-mail, set up web sites, etc. Although the system (an SGI Challenge XL) was relatively powerful for its time, this organization had some disadvantages:
- All the users had to share the processing queues. It was possible that one user could dominate the computer, preventing anyone else from running their own analysis jobs.
- If it became necessary to restart the computer, it had to be scheduled in advance (typically two weeks), since such a restart would affect almost everyone at Nevis.
- If the system's security became compromised, it affected everyone and everything on the system.
A distributed cluster
In the 2000's,
nevis1
was gradually replaced by
many Linux boxes. Administrative services were moved to separate systems, typically one service per box; e.g., there was a
mail server, a
DNS
server, a
Samba
server, etc. Each working group at Nevis purchased their own server and managed their own disk space. The above issues were resolved:
- Jobs could be sent to the condor batch system, so that no one system would be slowed down due to a user's jobs.
- A single computer system could be restarted without affected most of the rest of the cluster; e.g., if the mail server needed to be rebooted, it didn't affect the physics analysis.
- If a server because compromised, e.g., the web server, the effects could be restricted to that server.
This configuration worked well for a few years, but some issues arose over time:
- Each working group would purchase and maintain their server with their own funds. However, the administrative servers had to purchased with Nevis' infrastructure funds. That meant that the administrative servers would be replaced rarely, if at all.
- As a result, the administrative servers tended to be older, recycled, or inexpensive systems, with a correspondingly higher risk of failure.
- At one point there were seven administrative servers in our computer area, each one requiring an uninterruptible power supply, and each one contributing to the power and heat load of the computer room.
Consolidating systems
Over the last few years, there has been substantial work done in the open-source community towards high-availability servers. To put it simply, a service can be offered by a machine on a high-availability (HA) cluster. If that machine fails, the service automatically transfers to another machine on the HA cluster. The software packages used to implement HA on our cluster is
Corosync with Pacemaker
.
Another open-source development was virtual machines for Linux. If you've ever used
VMware
, you're already familiar with the concept. The software (actually, a kernel extension) used to implement virtual machines in Linux is called
Xen
.
The final "piece of the puzzle" is a software package plus kernel extension called
DRBD
. The simplest way to understand DRBD is to think of it as
RAID1
between two computers: when one computer makes a change to a disk partition, the other computer makes the identical change to its copy of the partition. In this way, the disk partition is mirrored between the two systems.
These tools can be used to solve the issues listed above:
- Six different computers, all old, could be replaced by two new systems.
- The systems could be new, but relatively inexpensive. If one of them failed, the other one would automatically take up the services that the first computer offered.
- The disk images of the two computers, including the virtual machines, could be kept automatically synchronized (as opposed to being copied via script run at regular intervals).
- Virtual machines are essentially large (~10GB) disk files. They can be manipulated as if they were separate computers (e.g., rebooted when needed) but can also be copied like disk files:
- If a virtual server had its security broken, it could be quickly replaced by a older, un-hacked copy of that virtual machine.
- If a virtual server required a complicated, time-consuming upgrade, a copy could be upgraded instead, then quickly swapped with a minimal interruption.
The price to be paid for all this sophistication is an increase in complexity. This page (and its companion page on the detailed
Corosync configuration) are an attempt to explain the details.