Conda

Conda is a popular package management system. Here are some tips and tricks for using that system on the Nevis particle-physics systems.

Before you go to conda

One thing I've noticed is that users are using conda to install packages that are already available on systems in the Nevis Linux cluster. Before you use conda to have access to packages like numpy or matplotlib, consider an alternative: just try running ROOT with python3 and see what happens.

You can see which packages I've installed as part of a system's Python3 distribution with:

pip3 freeze

Take a look. You might be surprised by what I've included. As of May-2022, the list includes:

wheel jupyter jupyterlab iminuit numpy scipy matplotlib pandas sympy terminado urllib3 tables rootpy rootkernel uproot scikit-learn tensorflow keras torch torchvision scikit-hep h5py astropy gammapy fitsio healpy astropy-healpix cython numba numba_stats xgboost

Note: Some of the above packages are not available on CentOS 7 systems.

All the packages you see on that list are also available via our notebook server.

If you want a package that's not on the list, you can install it yourself with:

pip3 install --user --upgrade <package-name>

Note: See the HiddenPackages topic. By default, the above command will store packages and libraries in your home directory, which may be a bad idea.

If a package is of sufficient interest that other users may want it, let WilliamSeligman know. He'll install it on the both the Nevis cluster and the notebook server.

This approach saves disk space. If every user starts installing entire software suites in their home directories, there's less space available for other work.

Standard conda environments

To get started with conda, it may help to use one of the "standard conda environments' available on the Nevis Linux cluster. The intent behind these environments:

  • They can ease the transition from CentOS 7 to AlmaLinux 9; a conda environment that runs on one OS will run on the other.
  • They save disk space. If what you need is already in /usr/nevis/conda/root, you don't have to create your own conda environment just to have access to numpy or matplotlib.
  • They save time. A conda environment can take a very long time to install (possibly even days!). These are already set up for you.

The basic recipe:

conda activate /usr/nevis/conda/root

This is intended to be a "plug-in replacement" for module load root in CentOS 7.

To see a complete list of all the available environments:

ls /usr/nevis/conda

As of Nov-2023, the available environments are:

  • /usr/nevis/conda/root - A fairly complete Python/ROOT environment, including all the packages in the list near the top of this topic.
  • /usr/nevis/conda/geant4 - The same as /usr/nevis/conda/root, with Geant4 included.
  • /usr/nevis/conda/root5 - An old version of ROOT (5.34) for old analysis efforts.

"I don't see a particular package that I need."

There are more packages installed in these environments than just the one in that above list. To check if a particular package is installed, or to check which version, use the conda list command; e.g. to check if the geant4 environment includes clhep:

conda list -p /usr/nevis/conda/geant4 | grep clhep

"Can I install another package in this environment with conda install or pip install?"

No. You don't have write access for these directories.

However, you can duplicate an existing environment using conda create with the --clone option. This procedure is much faster than creating a conda environment from scratch. For example, assume you want to make a copy of /usr/nevis/conda/root into /nevis/olga/share/jsmith/envs/root, then add the biopython package:

# Define a variable to make things easier
prefix=/nevis/olga/share/jsmith/envs/
# Create the directory that will contain our new environment.
mkdir -p ${prefix}
# Clone one environment to another location
conda create -y --prefix /usr/nevis/conda/root --clone ${prefix}/root
# Add the new package to the environment at the new location
conda install -y --prefix ${prefix}/root biopython
# Activate the new environment
conda activate ${prefix}/root

See Managing conda environments for more hints.

Caution: Contains latest and greatest

WilliamSeligman will periodically execute the following command for each environment:

conda update -y --prefix /usr/nevis/conv/root --all --prune

This will update all the packages in the environment to their latest versions. If you need to "lock" an environment, you can either copy the existing environment as described above, or create your own environment as described below.

"These conda activate commands are so long!"

This may be a good time to learn about the alias command and shell initialization files.

As an alternative, you can append the /usr/nevis/conda directory to your list of conda environment directories:

conda config --append envs_dirs /usr/nevis/conda

Then you'll be to just type (for example):

conda activate root

However, be careful not to create a named environment that is the same name as one in /usr/nevis/conda; for example, you may have problems if you execute conda create --name geant4.

Setting up conda

This section and the ones following are important if:

  • You want to create your own custom environments; for example, you need a particular version of Python.
  • You want to manage the disk space used for these environments (which can be quite large).

You don't have to install conda (or miniconda or anaconda). Conda is already installed on every system in the Nevis particle-physics Linux cluster.

Note that all of the commands in this section only have to be executed once. You can check the values of these overall conda options being set with the command:

conda config --show-sources

The default channel you'll probably want to assign for typical scientific research packages is conda-forge:

conda config --add channels conda-forge
conda config --set channel_priority strict

Note: strict priority guarantees that the packages you install within an environment will be consistent with one another. However, it does not guarantee that you'll get the latest-and-greatest versions, especially of packages like python. I've found that conda config --set channel_priority flexible will get the latest versions of packages, at the potential risk that some packages may be incompatible with each other.

Your shell’s prompt will be changed by conda. Even when you’re not using conda, the text (base) will appear at the beginning of the prompt. If this doesn’t bother you, then ignore it. If it does, you can try:

conda config --set auto_activate_base false

You’ll have to log off then log in again to see the change. If you don’t want conda to alter your prompt even when you’re using an environment, this command will suppress conda’s prompt changes:

conda config --set changeps1 False

Speed up conda

On AlmaLinux 9 systems, conda create and conda install can be glacially slow. This is because of the solver used to determine which packages are necessary to construct an environment.

There's an alternative solver installed on the AlmaLinux 9 systems on the Nevis cluster: libmamba. To use it, just execute the following command (this only has to be done once):

conda config --set solver libmamba

Conda environments by name

I strongly advise you to read the Conda documentation on managing environments. What follows are excerpts from that page that are the most relevant to our work at Nevis.

You can create a conda environment in your home directory with a command like this (note that the name jupyter-pyroot in the following examples is arbitrary):

Note: Creating environments in your home directory is a bad idea. See below, or read the HiddenPackages topic.

conda create --name jupyter-pyroot python root jupyter

and add packages to it later; e.g.,:

conda install --name jupyter-pyroot scipy matplotlib 

Note: The conda example packages above are already available as part of the standard system on most of the machines on the Nevis Linux cluster; once again, you don't need conda if that ones above are all you need. If you asked "Where's numpy in this example?", it (and others) are automatically installed by installing the root package.

To activate an environment and make its packages available to you:

conda activate --name jupyter-pyroot

Conda environments by location

The approach in the above section is good enough if you're working on your laptop, but probably not the best approach working on the Nevis Linux cluster:

  • It uses up space in your home directory. Even a small list of conda packages takes up a few gigabytes. If everyone in a group does this, the /home partition on your login server may be filled up with multiple copies of the exact same environment for each user.
  • If you want an environment accessible in a shared directory for all your condor jobs, or to share with multiple users (as we do with /usr/nevis/conda/root above), you'll want a different method.

That method is to define an environment by location:

# Find some place in your file-server directory hierarchy with enough space to suit your needs.
# Note that there is no machine named 'olga' at Nevis
cd /nevis/olga/data/jsmith

# Create an appropriate directory if one doesn't already exist 
# Note that the name 'myenv' is arbitrary.
mkdir myenv

# Create the conda environment within that directory.
# Again, this list of packages is an example (and unnecessary!)
conda create --prefix ./myenv jupyter python root

From that point forward, you can activate the environment by referring to its location:

conda activate /nevis/olga/data/jsmith/myenv

Copying a conda environment

I'm going to assume a "use case": You've got a conda environment in your home directory, and you now want to move it to location-based environment instead of a name-based one.

If you used pip3 to install a python package within this conda environment, then it might be copied over by this procedure. I've tried it a couple of times; once it worked, another time it didn't.

# See the name of any conda environments you've got
conda info --envs 

# Create the new location of the environment if it's not already there.
# Again, all these names and locations are examples.
prefix=/nevis/olga/data/jsmith/environments
mkdir -p ${prefix}

# Clone the conda environment (assume your original environment is named 'myenv')
conda create --name myenv --clone ${prefix}/myenv

After copying the environment, you can delete the old one:

conda remove --name myenv --all

Moving the default environment and package directories

By default, conda stores both its environment directory (variable envs_dirs) and its package directory (variable pkgs_dirs) in your home directory: ~/.conda. As noted above and in the HiddenPackages topic, these directories can take up a substantial amount of space as you continue to work with conda environments.

One solution is to move both of these default directories to a partition where you have enough space. For this example, let's assume that you've chosen to store both of these directories within /nevis/olga/data/jsmith/conda (again, this is not a directory that actually exists on any of the Nevis Linux cluster systems).

# Define a handy variable for this operation
prefix=/nevis/olga/data/jsmith/conda

# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs

# Move the package directory to the new location
# (To move the environments, use the clone recipe above)
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)

# Delete the old package directory
rm -rf ~/.conda/pkgs

Any new conda environments that you create will be in the directory given by envs_dirs.

Conda and pip

Both conda and pip are package managers. It's important to recognize that they are different package managers.

It's common for python, as a package, to be included in a conda environment. If you're going to use pip or pip3 within a conda environment, then the python package must be included in the conda environment. If you don't do this, the "native" version of pip will be used; in CentOS 7, this will invoke an old version of python, and the outcome will not affect the conda environment.

It's best to stick with conda to install any packages. If a given package (e.g., biopython) is available via conda, it's better to use conda install biopython than pip3 install biopython. Once you use pip to install a package, only use pip to update that package or anything that depends on that package.

As a rule of thumb, once you use pip to modify a conda environment, stick with pip from then on.

Note: On AlmaLinux 9 systems, the commands pip and pip3 are the same and can be used interchangeably. On CentOS 7 systems, pip and pip3 are not the same; python and pip point to Python 2.7; python3 and pip3 point to Python 3.6. In CentOS 7, you almost certainly want to use the commands python3 and pip3.

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r21 - 2023-11-21 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback