TWiki
>
ATLAS Web
>
UsingNevisT3
>
ArCondNevis
(2015-08-20,
GustaafBrooijmans
)
(raw view)
E
dit
A
ttach
---+ Using !ArCond at Nevis !ArCond is a wrapper that uses condor to automatically parallelize your jobs. When you copy a dataset to xrootd, the data is distributed evenly over all three of the T3 worker nodes. In addition to parallelization, !ArCond also runs jo<script type="text/javascript" src="http://www.nevis.columbia.edu/twiki/pub/TWiki/TinyMCEPlugin/tinymce/jscripts/tiny_mce/themes/advanced/langs/en.js"></script><script type="text/javascript" src="http://www.nevis.columbia.edu/twiki/pub/TWiki/TinyMCEPlugin/tinymce/jscripts/tiny_mce/plugins/twikibuttons/langs/en.js"></script><script type="text/javascript" src="http://www.nevis.columbia.edu/twiki/pub/TWiki/TinyMCEPlugin/tinymce/jscripts/tiny_mce/plugins/twikiimage/langs/en.js"></script>bs only on the nodes where the data is stored so there is no network load. The general !ArCond instructions are available <a href="https://atlaswww.hep.anl.gov/twiki/bin/view/Workbook/UsingPCF" target="_blank" title="Arcond User Guide">here</a>. These instructions are specific to users at ANL, but there may still be some information there that you will find useful. There are of course some drawbacks. The first of which is that you can't monitor how the job is progressing. You can only wait for the job to finish and check the log files after the fact. Secondly, you can only run over the entire dataset (modulo telling your submission scripts to only run a certain number of events, per job). Also, if the datasets you're running on are not very large (or the jobs are not very cpu intensive), then the time it takes to set up the submission scripts (which of course you only have to do once and then you can re-use them as often as you need), to submit the jobs, for condor to copy the packages/output to and from the worker nodes, and for you to combine the output (by hand) will probably be longer than the time it takes to just run the job locally. Of course, you can set up scripts to do all of the above to make things easier/quicker. In my experience, a job that normally takes > 1 hour or so to run locally is worth submitting to !ArCond (will take ~10 minutes to finish on !ArCond). Also, if you are submitting many jobs over many different datasets, writing scripts to submit these all to !ArCond rather than running them all sequentially will probably be much faster since you have many many cores on the worker nodes vs. 16 interactively (that are more consistently in use by others). ---++ The Tutorial This tutorial works in zsh. I'm not sure about bash. It will teach you how to submit c++ jobs which run over !D3PDs (in the <a href="https://twiki.cern.ch/twiki/bin/view/AtlasSandboxProtected/ColumbiaAnalysis" target="_blank" title="AnalysisUtilities User Guide">AnalysisUtilities</a> framework) using !ArCond. I'm fairly new to !ArCond, so it is possible that I've made mistakes (or that things could be done in a more efficient way). If you discover any, please let me know or update this page yourself. It's also possible to run athena on !ArCond, but I haven't tried to do that and I don't plan to. See the general !ArCond instructions for that. ---++ Setting Up For Running !ArCond First you need to set up !ArCond. I recommend putting the following into a script: <verbatim> setupATLAS #if you've setup T3 correctly, this should be an alias to 'source /a/data/xenia/share/atlas/ATLASLocalRootBase/user/atlasLocalSetup.sh' localSetupGcc --gccVersion=gcc432_x86_64_slc5 localSetupROOT --rootVersion=5.26.00-slc5-gcc4.3 localSetupPython export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib/ #export LD_PRELOAD=$ROOTSYS/lib/libXrdPosixPreload.so source /a/data/xenia/share/atlasadmin/condor/Arcond/etc/arcond/arcond_setup.sh</verbatim> With the subscriptions, data are transferred straight into xrootd. If you just now copied a dataset, you'll need to wait a few hours for the database to update before continuing (otherwise, !ArCond won't know that the data is available on the nodes by default). To check if your data is there, again do the above setup and do: <verbatim> arc_nevis_ls /data0/atlas/...</verbatim> If the data is there, you should list the files with the above command. Note that /xrootdfs is a virtual filesystem that is useful to take a quick look at all the files in the system. The physical filesystems are under /data0 on the worker nodes. So, <verbatim>ls /xrootdfs/atlas/... </verbatim> on xenia collates the info from <verbatim>ls /data0/atlas/... </verbatim> on the 22 worker nodes. ---++ Tutorial Package Check out the ArCondNevis package. This directory has the code to run the jobs (Analysis package), plus the submission scripts (in =arc_d3pd=). You will probably want to do this on your xenia user directory. Condor copies the output of your jobs to wherever you submit from. If the output files are large and you do this from a karthur or kolya directory (i.e. where most of our home directories are mounted), bad things could happen when the output is copied over. Also, when you cd to your xenia user directory, make sure you DO NOT go to the full NFS path =/a/data/xenia/users/username/= but rather =/data/users/username/=. You need to use the condor built-in file transfer mechanism, documented <a href="http://www.nevis.columbia.edu/twiki/bin/view/Nevis/DiskSharing#Let_condor_transfer_the_files" target="_blank" title="Condor file transfers"> here</a> or there's a chance you will bring the system to its knees. Then, check out the package: <verbatim> export SVNAP=svn+ssh://svn.cern.ch/reps/apenson cd /a/data/xenia/users/urbaniec kinit urbaniec@CERN.CH svn co $SVNAP/ArCondNevis</verbatim> Optional: If you want to run the code interactively to see what it does, execute the following commands: <verbatim> cd ArCondNevis/Analysis/AnalysisUtilities/goodRunsLists/cmt make cd ../.. make cd ../AnalysisTemplate make run InputFiles.txt physics Analysis.root -isMC 1</verbatim> This should take a few minutes to compile and run. The output is a file called Analysis.root which has some histograms, plus a slimmed TTree. ---++ ArCond Submission In the =ArCondNevis/arc_d3pd= directory, there is a file called =arcond.conf=. The only three important lines begin with =input_data= - where you specify the dataset (always in the form =/data/xrootd/=), =max_jobs_per_node= (remember there are 3 nodes, so multiply this number by 3 and you'll get the degree of parallelization of your jobs), and =package_dir= - where you specify the path to your analysis package to be copied to where your jobs will run. Modify these as you see fit (if you just want to run !ArCond out-of-the-box for the tutorial, leave these as they are). Now check out the =ArCondNevis/arc_d3pd/patterns= directory. Here you tell !ArCond what machines are available (and any requirements for those machines). The files all have the form =schema.site.xeniaXX.nevis.columbia.edu=. You don't need to modify these, but if nodes are ever added to the T3 site, you'll need to add a corresponding file. One thing you might want to modify is uncommenting the email notification line (else you'll get a ton of emails when the jobs finish... personally i let the emails come and then filter the emails, but this is really up to you). Finally, check out the =ArCondNevis/user= directory. There should be 3 files. The most important is called =ShellScript_BASIC.sh=. Open this and skip to the part where it says "user-defined part" (everything before this is !ArCond set up, e.g. copying packages to the nodes, setting up the parallelization, etc.). As you can see, it does some setup for !AnalysisUtilities, then compiles the packages, then runs the job with the following line: <verbatim> ./run InputFiles.txt physics Analysis.root > Analysis.log 2>&1</verbatim> =InputFiles.txt= is the the data list that is automatically created by !ArCond in the few lines above the user defined part (using the python jobOptions in the =user= directory). There is also a file called =InputFiles.txt= by default in the local version of the !AnalysisTemplate package, but it will get overwritten in the condor job. This is how !ArCond parallelizes the jobs. It divides the files in each dataset that are on each node into several =InputFiles.txt= files that !AnalysisUtilities then runs over in each sub job (so if you want to use !ArCond outside of !AnalysisUtilities, keep in mind that your code needs to run over a list of the data files and that the list needs to be called =InputFiles.txt= in the submitted job, that is of course unless you rename it in the shell script). ---++ Submitting the Jobs The package should be plug and play. There is no need to compile anything since compilation is in principle done on the worker nodes where the job is located. However, if you want to save time, I haven't had any issues with compiling locally and then copying over the binaries. To do that, just compile as above and comment out the compilation lines in =ShellScript_BASIC.sh=. This significantly reduces the time it takes to finish all the jobs. To run !ArCond, just do: <verbatim> cd ArCondNevis/arc_d3pd arcond -allyes</verbatim> You can then execute =condor_q= to view the queue and you should see 12 jobs under your username. If no one is using condor, your jobs should start running right away and take ~10 minutes to finish. You'll get an email sent to your nevis account when the jobs are finished. At any time, from the arc_d3pd directory, you can execute =arc_check= or =condor_q= to see which jobs are still running ( =arc_check= will also tell you which jobs have succeeded, but it will only work if you name your output root files =Analysis.root=). When all the jobs are finished (with status 0 in your emails if you get them), have a look at the output: <verbatim> cd ArCondNevis/arc_d3pd/Job ls</verbatim> You should see several directories (one for each job) entitled something like =run0_xenia01.nevis.columbia.edu/= Each of these directories contains the submission and execution scripts. They are also where the output is copied to once the job finishes. In this case, the output should be called =Analysis.log= (output text from !AnalysisUtilities job) and =Analysis.root=. Unfortunately, !ArCond does not automatically combine the jobs at the end. This can presumably be modified but I haven't done that yet (and at this point I don't plan to). To combine the output, in principle one is supposed to use =arc_add= (doesn't work for me). What I do is use =condor_q= to make sure all my jobs are done, then do something like the following: <verbatim> cd ArCondNevis/arc_d3pd/Job hadd -f Analysis_all.root run*/Analysis.root</verbatim> You'll probably see some errors related to different binning (due to the automatic rebinning in !AnalysisUtilities). This is another downside to the parallelization. I don't know a good solution for this at the moment. In my analysis I don't use variable size binning so all jobs are consistent. ---++ Open Issues A lot of the open issues with !ArCond require a greater understanding of the software than I currently have (not to mention the permissions for re-writing some of it): * Many of the commands don't work correctly (e.g. =arc_add=). * It would be nice to submit over many datasets from one =arcond.conf= file, but I don't know how to do this (comma separating the dataset names doesn't work). * Automatically hadding the output would make things much more convenient as well. * Having more flexibility to run over only (or exclude) certain files could potentially be useful. ---++ See Also: https://atlaswww.hep.anl.gov/twiki/bin/view/UsAtlasTier3/Tier3gUsersGuide http://www.nevis.columbia.edu/twiki/bin/view/ATLAS/UsingNevisT3 -- Nevis.DustinUrbaniec - 27 Aug 2010
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r8
<
r7
<
r6
<
r5
<
r4
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r8 - 2015-08-20
-
GustaafBrooijmans
ATLAS
Log In
or
Register
ATLAS Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
ATLAS
DOE
DZero
FutureTev
Main
TWiki
Veritas
Copyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback