August 2009 System Upgrades
There's only a two-week "window" in which I can perform these upgrades with the least impact on the Nevis research groups.
General procedure
- System upgrades will start at 3PM of the given day.
- Unless otherwise noted, the upgrades will take 2-3 hours.
- If you see "this will affect e-mail/logins...", then:
- The accounts listed won't be able to login or use e-mail during the upgrade;
- For everyone else, I'll reboot the mail and web servers sometime between 3:00-3:15PM; I hope this will reduce the chance of a mail-server slowdown. It's a good idea for you to quit your e-mail program during that time.
Schedule
This is the list of systems I plan to upgrade on a given day.
Monday, August 10
-
hogwarts.nevis.columbia.edu
-
polaris.nevis.columbia.edu
This will affect e-mail/logins for
grace.ho
,
willis
,
an2262
,
zhang
.
Tuesday, August 11
-
riverside.nevis.columbia.edu
-
morningside.nevis.columbia.edu
This will affect the e-mail/logins for most of the Neutrino group.
Wednesday, August 12
-
hypatia.nevis.columbia.edu
If all goes well, no one's e-mail be affected, nor will any other visible aspects of the Nevis cluster.
Thursday, August 13
This will also affect the e-mail/logins for any members of ATLAS with home directories on
kolya
.
Monday, August 17
-
hermes.nevis.columbia.edu
-
sullivan.nevis.columbia.edu
-
shang.nevis.columbia.edu
-
han.nevis.columbia.edu
During these upgrades,
condor will not be available. No Nevis
mailing lists
will be available. This will also affect the e-mail/logins for members of the DOE group, as well as
shaevitz
,
annmarie
,
jsantini
,
bishop
,
capone
, and
sciulli
.
Tuesday, August 18
-
karthur.nevis.columbia.edu
This may cause a Nevis-wide slowdown of all systems, since
karthur
is the library server. This will also affect the e-mail/logins for members of D0 and those members of ATLAS with home directories on
karthur
, including
ban
.
Wednesday, August 19
-
franklin.nevis.columbia.edu
This is our mail server. The upgrade will take 4-5 hours. During the upgrade, Nevis e-mail will
not be available.
Incoming e-mail won't be lost. It will be stored on the backup mail server in the Nevis Annex until the mail server comes back on.
Thursday, August 21
This is our web server. The upgrade will take 4-5 hours. During the upgrade, all Nevis web services (including the Wiki, meeting-room schedules, calendars, etc.) will
not be available.
What is the nature of the upgrade?
Unless otherwise noted, the Nevis cluster systems are being "upgraded" from Fedora 9 to Scientific Linux 5.3. That word is in quotes because the Scientific Linux 5 packages are earlier versions than those in Fedora 9, and the distribution has fewer features.
There is no "upgrade path" from Fedora 9 to Scientific Linux 5.3; I have to completely re-install the operating system. For some systems with a complex configuration, it would take too long to re-create those configurations in SL5.3. Therefore, the following systems being upgraded from Fedora 9 to Fedora 11, which will preserve the configuration information:
-
hypatia.nevis.columbia.edu
, the central admin server
-
franklin.nevis.columbia.edu
, the mail server
-
ada.nevis.columbia.edu
, the web server
-
annex.nevis.columbia.edu
, the Annex server
Upgrade problems
I've immediately encountered a severe problem with upgrading the Nevis servers: Scientific Linux cannot read the disk-partitioning scheme I've used.
This means that, in order to preserve the
/data
partition on your system, I have to copy it to another machine, then restore it again. This saturates the Nevis network, and can take a substantial portion of a day.
Please delete anything that you don't need, or can quickly restore, from the
/data
partition of your server. If you can, please tell me that I don't have to restore your
/data
partition, or tell me I can delete file left behind by summer student, etc.
Remember, the more you leave on
/data
, the longer your system's upgrade will take.
The specific issue: Fedora Linux can handle logical volumes (LVM) on top of software RAID. Scientific Linux cannot. This is causing problems with
hogwarts
and
polaris
; a two-hour upgrade turned into an eight-hour nightmare. It remains to be seen whether Scientific Linux has a problem with systems that use hardware RAID or LVM created under Fedora:
riverside, morningside, karthur, kolya, shang, han
.