Difference: TemperatureShutDown (3 vs. 4)

Revision 42011-07-25 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Computer room temperature and cluster shutdown

Line: 6 to 6
  As of July-2011, room 119 has two air conditioners: a main 5-ton unit that's over 18 years old, and a backup unit that's over 40 years old. Although both units are generally kept in good repair, it's become more common in recent years for one of them to fail; if both fail at the same time, the heat from the computer systems will cause the temperature in that room to rise to the point where it might damage the equipment. In order to preserve the systems and the files on their hard drives, there's an automated procedure for shutting down the computers in response to high temperatures in the computer room.
Changed:
<
<
The idea is to try to keep the cluster in a useful state for as long as possible, while shutting down the less-necessary systems to keep the room cooler if possible. This is implemented as staged levels of escalation. Every ten minutes a script run to check the computer room's temperature:
>
>
The idea is to try to keep the cluster in a useful state for as long as possible, while shutting down the less-necessary systems to keep the room cool. This is implemented as staged levels of escalation. Every ten minutes a script run to check the computer room's temperature:
 
  1. If the temperature goes over threshold (currently 90 degrees), the batch nodes and the backup server will be shut down.
  2. If the temperature remains over threshold the next time the script is run, the file servers will be shut down.
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback