Difference: Ups (6 vs. 7)

Revision 72018-10-22 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Nevis UPS Management

Line: 49 to 49
 
  • The problem is if there's an intermediate-length power outage (20-30 minutes). The servers go down in response to the low-battery signal from the UPSes, but the UPSes don't have time to fully drain their batteries. This means the servers don't come back up, since they went down via an internal "shutdown -h" command and their AC power is never interrupted.
Changed:
<
<
To solve this problem, one node has been designated the "wake-up" box. As of Apr-2012 it's hermes01.nevis.columbia.edu, but it could be any node not connected to a UPS. It goes down when power is cut off, comes back up when power is restored. As soon as it comes back up, it sends a wake-up signal to the servers. If a server has IPMI, it uses that; otherwise it sends a wake-on-lan signal.
>
>
To solve this problem, one node has been designated the "wake-up" box. As of Apr-2016 it's kennel00.nevis.columbia.edu, but it could be any node not connected to a UPS. It goes down when power is cut off, comes back up when power is restored. As soon as it comes back up, it sends a wake-up signal to the servers. If a server has IPMI, it uses that; otherwise it sends a wake-on-lan signal.
  The BIOS of the "wake-up" is set to bring up the system quickly, unlike the BIOSes of the other systems which are set to delay for as long as possible to give the main servers a chance to come back up. This means the "wake-up" box might come up before NIS and NFS is available on the cluster. This seems a reasonable price to pay for having the rest of the cluster come back up in a working state.
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback