Difference: CompilerTools (1 vs. 3)

Revision 32015-02-03 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Compiler Tools

Line: 32 to 32
  If you've worked on more than one system on the Nevis Linux cluster, you may have encountered the following problem: You compile a program
Changed:
<
<
on a system with one version of Fedora, but find that there are
>
>
on a system with one version of Scientific Linux, but find that there are
 library incompatibilities when you try to execute the program on another version, or have problems linking with one of the ROOT or Geant4 libraries.
Line: 44 to 44
 libraries) on the applications server. I refer to them as the "standardized" compilers, since the compilers and the programs they produce should execute in the same way on all the Nevis systems.
Changed:
<
<
The available versions are accessible via the setup command:
>
>
The available versions are accessible via the command:
 
Changed:
<
<
setup gcc34 # GCC 3.4.6 setup gcc44 # GCC 4.4.0 setup gcc45 # GCC 4.5.2 setup gcc47 # GCC 4.7.2
>
>
module avail gcc
 
Deleted:
<
<
The command unsetup compiler will unset any variables associated with the above setup commands; you'll then use the native compiler on the system on which you're working.

If you just type setup, you'll see that I've prepared versions of CLHEP, Geant4, and ROOT compiled with the standardized versions of GCC. I'll generally stick to the main compiler version used by the current version of Scientific Linux adopted by the other high-energy physics laboratories.

 This is not a cure-all to the issue of mixed library versions on different systems. but it should go some way towards developing programs that will execute properly on all the systems in the batch farm.

Note that some software development environments, e.g., LArSoft, come with their own standardized compilers. Do not use the Nevis compilers with those frameworks, unless you know what you're doing.

Deleted:
<
<

Distributed compilation

Distcc is a distributed compiler system that's useful when compiling big projects. Instead of compiling all your programs on a single computer, distcc will distribute successive compilations onto different machines on the Nevis cluster. The typical use of distcc might be:

make -j10 CXX="distcc g++"

This assumes that the make file (Makefile, makefile, GNUmakefile) has been set up such that CXX is an alias for your C++ compiler.

The -j option specifies the number of simultaneous compilations that the make processor will allow. Given the number of systems available on the Nevis Linux cluster, the number of processing queues available, and other issues associated with compiling programs, a value of up to -j100 is not unreasonable for our cluster (but see the warning below.)

Note that distcc does not distribute the pre-processing of the source code (e.g., substituting #include files) or program linking. Only the compilation process itself is distributed.

The above example command is actually quite risky on the Nevis cluster. Our systems have different versions of Fedora Core installed, and therefore different versions of the GNU C++ compiler. The simplest solution is to use the standardized compilers. So the example command above could become:

make -j50 CXX="distcc ${GCC_DIR}/bin/g++"

Another compilation tool I've installed on the Nevis Linux cluster is ccache. This is a tool to save some of the intermediate files created during the compilation process, and re-use them if possible in subsequent re-compilations of the code.

The main use of ccache is in projects for which you find yourself typing make clean;make frequently. With ccache, make clean is not needed; instead you might type:

make CXX="ccache g++"

The two compiler tools distcc and ccache can be used together to greatly speed up the compilation of large projects. To combine the two examples above, one would type:

make -j50 CXX="ccache distcc ${GCC_DIR}/bin/g++"

To (hopefully) simplify the use of these tools with the standardized compilers I've prepared, I define the following variables when you use the setup commands described above:

distgcc   # ccache distcc ${GCC_DIR}/bin/gcc, for the C compiler
distgpp   # ccache distcc ${GCC_DIR}/bin/g++, for the C++ compiler
distgxx   # ccache distcc ${GCC_DIR}/bin/g++, for the C++ compiler

(A variable named $distg++ would create problems in most shells.)

So a shorter make command one could use when I compile big projects is:

make -j50 CXX=$distgpp

For my own work, I go further and define the following in my ~/.myprofile setup script:

# Make sure I use the distcc versions of the compiler.
export CXX=${distgpp}
export CC=${distgcc}

So the command I'd actually use with those variables defined is:

make -j50

These tools can speed up the compilation of large projects by a factor of 5 to 10.

A warning about distributed compilation

A value of -j50 (which tells make to run 50 compilation processes at once) will work well with distcc if the build process has been organized in the following way:

  • Set up the files needed for the project build
  • Compile all the source code files at once
  • Link the libraries together and cleanup

An example of a project that compiles well using these tools is the Reactor Analysis Tool.

However, the build process will slow to a crawl if it's structured in this way:

  • For each library in the project:
    • Individual library setup
    • Compile programs for that library
    • Link the library

An example of a project that compiles in this manner is Geant4.

Another example of a build process that by-passes the speed improvements associated with distcc is:

  • Run a pre-process step to generate source code
  • Compile one or two programs
  • Repeat as needed

An example of a project that compiles this way is ROOT.

For projects like this, a value of -j10 or even -j5 may give the maximal compilation speed; any larger value and the computer will bog down with dozens of pre-processing or linking processes.

Revision 22013-04-26 - WilliamSeligman

Line: 1 to 1
 
META TOPICPARENT name="Computing"

Compiler Tools

Line: 6 to 6
 
Changed:
<
<
A discussion of compiler tools available at Nevis:
>
>
A discussion of compiler tools available at Nevis.
 

Code checking

Changed:
<
<
Before you compile, you can use the cppcheck command. This command can sometimes find bugs that the compiler does not see. A typical use might be:
>
>
The cppcheck command can find problems the compiler does not detect.

A typical use might be:

 
cppcheck *.cxx
This will check all the .cxx files within the current directory.
Added:
>
>
A more practical example:
cppcheck --enable=style myCodeDir

This will check all the C++ files in the sub-directory myCodeDir.

 

Standardized compilers

If you've worked on more than one system on the Nevis Linux cluster,

Line: 35 to 47
 The available versions are accessible via the setup command:
setup gcc34   # GCC 3.4.6
Deleted:
<
<
setup gcc41 # GCC 4.1.1
 setup gcc44 # GCC 4.4.0
Added:
>
>
setup gcc45 # GCC 4.5.2 setup gcc47 # GCC 4.7.2
 

The command unsetup compiler will unset any variables associated with the above setup commands; you'll then use the native compiler on the system on which you're working.

Changed:
<
<
If you just type setup, you'll see that I've prepared versions of CLHEP, Geant4, and ROOT compiled with the standardized GCC 3.4.6. I'll generally stick to the main compiler version used by the current version of Scientific Linux adopted by the other high-energy physics laboratories.
>
>
If you just type setup, you'll see that I've prepared versions of CLHEP, Geant4, and ROOT compiled with the standardized versions of GCC. I'll generally stick to the main compiler version used by the current version of Scientific Linux adopted by the other high-energy physics laboratories.
  This is not a cure-all to the issue of mixed library versions on different systems. but it should go some way towards developing programs that will execute properly on all the systems in the batch farm.
Added:
>
>
Note that some software development environments, e.g., LArSoft, come with their own standardized compilers. Do not use the Nevis compilers with those frameworks, unless you know what you're doing.
 

Distributed compilation

Distcc

Revision 12013-04-25 - WilliamSeligman

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="Computing"

Compiler Tools

A discussion of compiler tools available at Nevis:

Code checking

Before you compile, you can use the cppcheck command. This command can sometimes find bugs that the compiler does not see. A typical use might be:

cppcheck *.cxx
This will check all the .cxx files within the current directory.

Standardized compilers

If you've worked on more than one system on the Nevis Linux cluster, you may have encountered the following problem: You compile a program on a system with one version of Fedora, but find that there are library incompatibilities when you try to execute the program on another version, or have problems linking with one of the ROOT or Geant4 libraries.

In general, the problem is with the library libstdc++.so, but there can be other issues as well. This problem can be particularly acute if you're preparing a job to run on our condor batch system, since there are systems with different OS versions on the farm.

To help deal with this problem, I've prepared self-contained compiler suites (with all the required libraries) on the applications server. I refer to them as the "standardized" compilers, since the compilers and the programs they produce should execute in the same way on all the Nevis systems. The available versions are accessible via the setup command:

setup gcc34   # GCC 3.4.6
setup gcc41   # GCC 4.1.1
setup gcc44   # GCC 4.4.0

The command unsetup compiler will unset any variables associated with the above setup commands; you'll then use the native compiler on the system on which you're working.

If you just type setup, you'll see that I've prepared versions of CLHEP, Geant4, and ROOT compiled with the standardized GCC 3.4.6. I'll generally stick to the main compiler version used by the current version of Scientific Linux adopted by the other high-energy physics laboratories.

This is not a cure-all to the issue of mixed library versions on different systems. but it should go some way towards developing programs that will execute properly on all the systems in the batch farm.

Distributed compilation

Distcc is a distributed compiler system that's useful when compiling big projects. Instead of compiling all your programs on a single computer, distcc will distribute successive compilations onto different machines on the Nevis cluster. The typical use of distcc might be:

make -j10 CXX="distcc g++"

This assumes that the make file (Makefile, makefile, GNUmakefile) has been set up such that CXX is an alias for your C++ compiler.

The -j option specifies the number of simultaneous compilations that the make processor will allow. Given the number of systems available on the Nevis Linux cluster, the number of processing queues available, and other issues associated with compiling programs, a value of up to -j100 is not unreasonable for our cluster (but see the warning below.)

Note that distcc does not distribute the pre-processing of the source code (e.g., substituting #include files) or program linking. Only the compilation process itself is distributed.

The above example command is actually quite risky on the Nevis cluster. Our systems have different versions of Fedora Core installed, and therefore different versions of the GNU C++ compiler. The simplest solution is to use the standardized compilers. So the example command above could become:

make -j50 CXX="distcc ${GCC_DIR}/bin/g++"

Another compilation tool I've installed on the Nevis Linux cluster is ccache. This is a tool to save some of the intermediate files created during the compilation process, and re-use them if possible in subsequent re-compilations of the code.

The main use of ccache is in projects for which you find yourself typing make clean;make frequently. With ccache, make clean is not needed; instead you might type:

make CXX="ccache g++"

The two compiler tools distcc and ccache can be used together to greatly speed up the compilation of large projects. To combine the two examples above, one would type:

make -j50 CXX="ccache distcc ${GCC_DIR}/bin/g++"

To (hopefully) simplify the use of these tools with the standardized compilers I've prepared, I define the following variables when you use the setup commands described above:

distgcc   # ccache distcc ${GCC_DIR}/bin/gcc, for the C compiler
distgpp   # ccache distcc ${GCC_DIR}/bin/g++, for the C++ compiler
distgxx   # ccache distcc ${GCC_DIR}/bin/g++, for the C++ compiler

(A variable named $distg++ would create problems in most shells.)

So a shorter make command one could use when I compile big projects is:

make -j50 CXX=$distgpp

For my own work, I go further and define the following in my ~/.myprofile setup script:

# Make sure I use the distcc versions of the compiler.
export CXX=${distgpp}
export CC=${distgcc}

So the command I'd actually use with those variables defined is:

make -j50

These tools can speed up the compilation of large projects by a factor of 5 to 10.

A warning about distributed compilation

A value of -j50 (which tells make to run 50 compilation processes at once) will work well with distcc if the build process has been organized in the following way:

  • Set up the files needed for the project build
  • Compile all the source code files at once
  • Link the libraries together and cleanup

An example of a project that compiles well using these tools is the Reactor Analysis Tool.

However, the build process will slow to a crawl if it's structured in this way:

  • For each library in the project:
    • Individual library setup
    • Compile programs for that library
    • Link the library

An example of a project that compiles in this manner is Geant4.

Another example of a build process that by-passes the speed improvements associated with distcc is:

  • Run a pre-process step to generate source code
  • Compile one or two programs
  • Repeat as needed

An example of a project that compiles this way is ROOT.

For projects like this, a value of -j10 or even -j5 may give the maximal compilation speed; any larger value and the computer will bog down with dozens of pre-processing or linking processes.

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback