Difference: BatchDetails (10 vs. 11)

Revision 112018-05-25 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Condor"

Details of running condor at Nevis

Line: 33 to 33
 

Nevis software initialization

Changed:
<
<
Nevis environment modules command require initialization. When you login, this initialization is done for you; look at your ~/.profile file (~/.cshrc if you use tcsh). You have to explicitly include this line if you're submitting a batch job and you're using setup
>
>
Nevis environment modules command require initialization. When you login, this initialization is done for you; look at your ~/.profile file (~/.cshrc if you use tcsh). You have to explicitly include this line if you're submitting a batch job and you're using module load.
 
source /usr/nevis/adm/nevis-init.sh
Line: 46 to 46
 

Memory limits

Changed:
<
<
The systems on the condor batch cluster have enough RAM for 1GB/processing queue. This means if your job uses more than 1GB of memory, there can be a problem. For example, if your job required 2GB of memory, and a condor batch node had 16 queues, then your 16 jobs will require 32GB of RAM, twice as much as the machine has. The machine will start swapping memory pages continuously, and essentially halt.
>
>
Many systems on the condor batch cluster have only enough RAM for 1GB/processing queue. This means if your job uses more than 1GB of memory, there can be a problem. For example, if your job required 2GB of memory, and a condor batch node had 16 queues, then your 16 jobs will require 32GB of RAM, twice as much as the machine has. The machine will start swapping memory pages continuously, and essentially halt.
 
Changed:
<
<
To keep this from happening, condor will automatically cancel a job that requires more than 1GB of RAM. Unfortunately, condor has a problem estimating the amount of memory required by a running job: if a program uses threads, it will tend to overestimate; if a program uses shared libraries, it tends to underestimate.
>
>
To keep this from happening, condor will automatically cancel a job that requires more RAM than a queue has available. Unfortunately, condor has a problem estimating the amount of memory required by a running job: if a program uses threads, it will tend to overestimate; if a program uses shared libraries, it tends to underestimate.
  Therefore, if you find that your large simulation program is being "spontaneously" canceled, look at its memory use.
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback