August 2009 System Upgrades

There's only a two-week "window" in which I can perform these upgrades with the least impact on the Nevis research groups.

General procedure

  • System upgrades will start at 3PM of the given day.
  • Unless otherwise noted, the upgrades will take 2-3 hours.
  • If you see "this will affect e-mail/logins...", then:
    • The accounts listed won't be able to login or use e-mail during the upgrade;
    • For everyone else, I'll reboot the mail and web servers sometime between 3:00-3:15PM; I hope this will reduce the chance of a mail-server slowdown. It's a good idea for you to quit your e-mail program during that time.

Schedule

This is the list of systems I plan to upgrade on a given day.

Monday, August 10

  • hogwarts.nevis.columbia.edu
  • polaris.nevis.columbia.edu

This will affect e-mail/logins for grace.ho, willis, an2262, zhang.

Tuesday, August 11

  • riverside.nevis.columbia.edu
  • morningside.nevis.columbia.edu

This will affect the e-mail/logins for most of the Neutrino group.

Wednesday, August 12

  • hypatia.nevis.columbia.edu

If all goes well, no one's e-mail be affected, nor will any other visible aspects of the Nevis cluster.

Thursday, August 13

  • kolya.nevis.columbia.edu

This will also affect the e-mail/logins for any members of ATLAS with home directories on kolya.

Monday, August 17

  • hermes.nevis.columbia.edu
  • sullivan.nevis.columbia.edu
  • shang.nevis.columbia.edu
  • han.nevis.columbia.edu

During these upgrades, condor will not be available. No Nevis mailing lists will be available. This will also affect the e-mail/logins for members of the DOE group, as well as shaevitz, annmarie, jsantini, bishop, capone, and sciulli.

Tuesday, August 18

  • karthur.nevis.columbia.edu

This may cause a Nevis-wide slowdown of all systems, since karthur is the library server. This will also affect the e-mail/logins for members of D0 and those members of ATLAS with home directories on karthur, including ban.

Wednesday, August 19

  • franklin.nevis.columbia.edu

This is our mail server. The upgrade will take 4-5 hours. During the upgrade, Nevis e-mail will not be available.

Incoming e-mail won't be lost. It will be stored on the backup mail server in the Nevis Annex until the mail server comes back on.

Thursday, August 21

  • ada.nevis.columbia.edu

This is our web server. The upgrade will take 4-5 hours. During the upgrade, all Nevis web services (including the Wiki, meeting-room schedules, calendars, etc.) will not be available.

What is the nature of the upgrade?

Unless otherwise noted, the Nevis cluster systems are being "upgraded" from Fedora 9 to Scientific Linux 5.3. That word is in quotes because the Scientific Linux 5 packages are earlier versions than those in Fedora 9, and the distribution has fewer features.

There is no "upgrade path" from Fedora 9 to Scientific Linux 5.3; I have to completely re-install the operating system. For some systems with a complex configuration, it would take too long to re-create those configurations in SL5.3. Therefore, the following systems being upgraded from Fedora 9 to Fedora 11, which will preserve the configuration information:

  • hypatia.nevis.columbia.edu, the central admin server
  • franklin.nevis.columbia.edu, the mail server
  • ada.nevis.columbia.edu, the web server
  • annex.nevis.columbia.edu, the Annex server

Upgrade problems

I've immediately encountered a severe problem with upgrading the Nevis servers: Scientific Linux cannot read the disk-partitioning scheme I've used.

This means that, in order to preserve the /data partition on your system, I have to copy it to another machine, then restore it again. This saturates the Nevis network, and can take a substantial portion of a day.

Please delete anything that you don't need, or can quickly restore, from the /data partition of your server. If you can, please tell me that I don't have to restore your /data partition, or tell me I can delete file left behind by summer student, etc.

Remember, the more you leave on /data, the longer your system's upgrade will take.

The specific issue: Fedora Linux can handle logical volumes (LVM) on top of software RAID. Scientific Linux cannot. This is causing problems with hogwarts and polaris; a two-hour upgrade turned into an eight-hour nightmare. It remains to be seen whether Scientific Linux has a problem with systems that use hardware RAID or LVM created under Fedora: riverside, morningside, karthur, kolya, shang, han.

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2009-08-10 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback