Conda

Conda is a popular package management system. Here are some tips and tricks for using that system on the Nevis particle-physics systems.

Before you go to conda

One thing I've noticed is that users are using conda to install packages that are already available in the Nevis environment modules. Before you use conda to have access to packages like numpy or matplotlib, consider an alternative: use this command to load the Python version and libraries that I've installed on our applications server:

module load python

This will load the latest version of Python (3.10 as of May-2022) from the Linux cluster software libraries (use module avail python to see all the available versions). You can see which packages I've installed as part of our Python distribution with:

pip freeze

Take a look. You might be surprised by what I've included. As of May-2022, the list includes:

wheel jupyter jupyterlab iminuit numpy scipy matplotlib pandas sympy terminado urllib3 tables rootpy rootkernel uproot scikit-learn tensorflow keras torch torchvision scikit-hep h5py astropy gammapy fitsio healpy astropy-healpix cython numba numba_stats

All the packages you see on that list are also available via our notebook server.

If you want a package that's not on the list, you can install it in your home directory with:

pip install --user --upgrade <package-name>

If a package is of sufficient interest that other users may want it, let WilliamSeligman know. He'll install it on the both the library server and the notebook server.

This approach has some advantages:

  • It saves disk space. If every user starts installing entire software suites in their home directories, there's less space available for other work.

  • The library server directory /usr/nevis is available to every system in the Linux cluster, including the batch nodes. It might simplify the task of making necessary software packages available for your condor jobs.

Conda and module load can be mixed, but it's tricky. If you try to use both at once, carefully consider what each will do to your $PATH and $LD_LIBRARY_PATH environment variables.

Setting up conda

If you need a particular version of a package, or the software you need is not a Python-based package, then it makes sense to use conda.

You don't have to install conda (or miniconda or anaconda). Conda version 4.6.14 is already installed on every system in the Nevis particle-physics Linux cluster.

The default channel you'll probably want to assign for typical scientific research packages is conda-forge:

conda config --add channels conda-forge
conda config --set channel_priority strict

Your shell’s prompt will be changed by conda. Even when you’re not using conda, the text (base) will appear at the beginning of the prompt. If this doesn’t bother you, then ignore it. If it does, you can try:

conda config --set auto_activate_base false

You’ll have to log off then log in again to see the change. If you don’t want conda to alter your prompt even when you’re using an environment, this command will suppress conda’s prompt changes:

conda config --set changeps1 False

Conda environments by name

I strongly advise you to read the Conda documentation on managing environments. What follows are excerpts from that page that are the most relevant to our work at Nevis.

You can create a conda environment in your home directory with a command like this (note that the name jupyter-pyroot in the following examples is arbitrary):

conda create --name jupyter-pyroot jupyter python root

and add packages to it later; e.g.,:

conda install --name jupyter-pyroot jupyterlab numpy scipy matplotlib 

Note: The conda example packages above are already available if you use module load root. All the above-named packages are there, since module load root also loads a corresponding version of Python.

To activate an environment and make its packages available to you:

conda activate --name jupyter-pyroot

Conda environments by location

The approach in the above section is good enough if you're working on your laptop, but probably not the best approach working on the Nevis cluster:

  • It uses up space in your home directory. Even a small list of conda packages takes up a few gigabytes. If everyone in a group does this, the /home partition on your login server may be filled up with multiple copies of the exact same environment for each user.

  • If you want an environment accessible in a shared directory for all your condor jobs, or to share with multiple users, you'll want a different method.

That method is to define an environment by location:

# Find some place in your file-server directory hierarchy with enough space to suit your needs.
# Note that there is no machine named 'olga' at Nevis
cd /nevis/olga/data/

# Create an appropriate directory if one doesn't already exist 
# Note that the name 'myenv' is arbitrary.
mkdir myenv

# Create the conda environment within that directory.
# Again, this list of packages is an example (and unnecessary!)
conda create --prefix ./myenv jupyter python root

From that point forward, you can activate the environment by including the directory:

conda activate /nevis/olga/data/myenv

Copying a conda environment

I'm going to assume a "use case": You've got a conda environment in your home directory, and you now want to move it to location-based environment instead of a name-based one.

If you used pip to install a python package within this conda environment, then it might be copied over by this procedure. I've tried it a couple of times; once it worked, another time it didn't.

# See the name of any conda environments you've got
conda info --envs 

# Create the new location of the environment if it's not already there.
# Again, all these names and locations are examples.
mkdir -p /nevis/olga/data/environments

# Clone the conda environment (assume your original environment is named 'myenv')
conda create --name myenv --clone /nevis/olga/data/environments/myenv

After copying the environment, you can delete the old one:

conda remove --name myenv --all

Moving the default environment and package directories

By default, conda stores both its environment directory (variable envs_dirs) and its package directory (variable pkgs_dirs) in your home directory: ~/.conda. As noted above, these directories can begin to take up a substantial amount of space as you continue to work with conda environments.

One solution is to move both of these directories to a partition where you have enough space. For this example, let's assume that you've chosen to store both of these directories within /nevis/olga/data/jsmith/conda (again, this is not a directory that actually exists on any of the Nevis Linux cluster systems).

# Define a handy variable for this operation
prefix=/nevis/olga/data/jsmith/conda

# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs

# Move the package directory to the new location
# (To move the environments, use the clone recipe above)
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)

# Delete the old package directory
rm -rf ~/.conda/pkgs

Any new conda environments that you create will be in the directory given by envs_dirs.

Conda and pip

Both conda and pip are package managers. It's important to recognize that they are different package managers.

It's common for python, as a package, to be included in a conda environment. If you're going to use pip within a conda environment, then the python package must be included in the conda environment. If you don't do this, the native CentOS 7 version of pip will be used, which works with an old version of python, and whose effects will not affect the conda environment.

It's best to stick with conda to install any packages. If a given package (e.g., biopython) is available via conda, it's better to use conda install biopython than pip install biopython. Once you use pip to install a package, only use pip to update that package or anything that depends on that package.

As a rule of thumb, once you use pip to modify a conda environment, stick with pip from then on.

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r14 - 2023-05-19 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback