Conda
Conda
is a popular package management system. Here are some tips and tricks for using that system on the Nevis particle-physics systems.
Before you go to conda
One thing I've noticed is that users are using conda to install packages that are already available on systems in the Nevis
Linux cluster. Before you use conda to have access to packages like
numpy
or
matplotlib
, consider an alternative: just try running
ROOT
with
python3
and see what happens.
You can see which packages I've installed as part of a system's Python3 distribution with:
pip3 freeze
Take a look. You might be surprised by what I've included. As of May-2022, the list includes:
wheel jupyter jupyterlab iminuit numpy scipy matplotlib pandas sympy terminado urllib3 tables rootpy rootkernel uproot scikit-learn tensorflow keras torch torchvision scikit-hep h5py astropy gammapy fitsio healpy astropy-healpix cython numba numba_stats xgboost
Note: Some of the above packages are not available on CentOS 7 systems.
All the packages you see on that list are also available via our
notebook server.
If you want a package that's not on the list, you can install it yourself with:
pip3 install --user --upgrade <package-name>
Note: See the HiddenPackages topic. By default, the above command will store packages and libraries in your home directory, which may be a bad idea.
If a package is of sufficient interest that other users may want it, let
WilliamSeligman know. He'll install it on the both the Nevis cluster and the notebook server.
This approach saves disk space. If every user starts installing entire software suites in their home directories, there's less space available for other work.
Standard conda environments
To get started with conda, it may help to use one of the "standard conda environments' available on the Nevis
Linux cluster. The intent behind these environments:
- They can ease the transition from CentOS 7 to AlmaLinux 9; a conda environment that runs on one OS will run on the other.
- They save disk space. If what you need is already in
/usr/nevis/conda/root
, you don't have to create your own conda environment just to have access to numpy or matplotlib.
- They save time. A conda environment can take a very long time to install (possibly even days!). These are already set up for you.
The basic recipe:
conda activate /usr/nevis/conda/root
This is intended to be a "plug-in replacement" for
module load root
in CentOS 7.
To see a complete list of all the available environments:
ls /usr/nevis/conda
As of Nov-2023, the available environments are:
-
/usr/nevis/conda/root
- A fairly complete Python
/ROOT
environment, including all the packages in the list near the top of this topic.
-
/usr/nevis/conda/geant4
- The same as /usr/nevis/conda/root
, with Geant4
included.
-
/usr/nevis/conda/root5
- An old version of ROOT (5.34) for old analysis efforts.
"I don't see a particular package that I need."
There are more packages installed in these environments than just the one in that above list. To check if a particular package is installed, or to check which version, use the
conda list
command; e.g. to check if the
geant4
environment includes
clhep
:
conda list -p /usr/nevis/conda/geant4 | grep clhep
"Can I install another package in this environment with conda install
or pip install
?"
No. You don't have write access for these directories.
However, you can duplicate an existing environment using
conda create
with the
--clone
option. This procedure is much faster than creating a conda environment from scratch. For example, assume you want to make a copy of
/usr/nevis/conda/root
into
/nevis/olga/share/jsmith/envs/root
, then add the
biopython
package:
# Define a variable to make things easier
prefix=/nevis/olga/share/jsmith/envs/
# Create the directory that will contain our new environment.
mkdir -p ${prefix}
# Clone one environment to another location
conda create -y --prefix /usr/nevis/conda/root --clone ${prefix}/root
# Add the new package to the environment at the new location
conda install -y --prefix ${prefix}/root biopython
# Activate the new environment
conda activate ${prefix}/root
See
Managing conda environments
for more hints.
Caution: Contains latest and greatest
WilliamSeligman will periodically execute the following command for each environment:
conda update -y --prefix /usr/nevis/conv/root --all --prune
This will update all the packages in the environment to their latest versions. If you need to "lock" an environment, you can either copy the existing environment as described above, or create your own environment as described below.
"These conda activate
commands are so long!"
This may be a good time to learn about
the alias command
and
shell initialization files
.
As an alternative, you can append the
/usr/nevis/conda
directory to your list of conda environment directories:
conda config --append envs_dirs /usr/nevis/conda
Then you'll be to just type (for example):
conda activate root
However, be careful not to create a named environment that is the same name as one in
/usr/nevis/conda
; for example, you may have problems if you execute
conda create --name geant4
.
Setting up conda
This section and the ones following are important if:
- You want to create your own custom environments; for example, you need a particular version of Python.
- You want to manage the disk space used for these environments (which can be quite large).
You don't have to install conda (or miniconda or anaconda). Conda is already installed on every system in the Nevis particle-physics Linux cluster.
Note that all of the commands in this section only have to be executed once. You can check the values of these overall conda options being set with the command:
conda config --show-sources
The default channel you'll probably want to assign for typical scientific research packages is
conda-forge
:
conda config --add channels conda-forge
conda config --set channel_priority strict
Note: strict
priority guarantees that the packages you install within an environment will be consistent with one another. However, it does not guarantee that you'll get the latest-and-greatest versions, especially of packages like python
. I've found that conda config --set channel_priority flexible
will get the latest versions of packages, at the potential risk that some packages may be incompatible with each other.
Your shell’s prompt will be changed by conda. Even when you’re not using conda, the text
(base)
will appear at the beginning of the prompt. If this doesn’t bother you, then ignore it. If it does, you can try:
conda config --set auto_activate_base false
You’ll have to log off then log in again to see the change. If you don’t want conda to alter your prompt even when you’re using an environment, this command will suppress conda’s prompt changes:
conda config --set changeps1 False
Speed up conda
On AlmaLinux 9 systems,
conda create
and
conda install
can be glacially slow. This is because of the
solver
used to determine which packages are necessary to construct an environment.
There's an alternative solver installed on the AlmaLinux 9 systems on the Nevis cluster:
libmamba
. To use it, just execute the following command (this only has to be done once):
conda config --set solver libmamba
Conda environments by name
I strongly advise you to read the Conda documentation on
managing environments
. What follows are excerpts from that page that are the most relevant to our work at Nevis.
You can create a conda environment in your home directory with a command like this (note that the name
jupyter-pyroot
in the following examples is arbitrary):
Note: Creating environments in your home directory is a bad idea. See below, or read the HiddenPackages topic.
conda create --name jupyter-pyroot python root jupyter
and add packages to it later; e.g.,:
conda install --name jupyter-pyroot scipy matplotlib
Note: The conda example packages above are already available as part of the standard system on most of the machines on the Nevis Linux cluster; once again, you don't need conda if that ones above are all you need. If you asked "Where's numpy in this example?", it (and others) are automatically installed by installing the root package.
To activate an environment and make its packages available to you:
conda activate --name jupyter-pyroot
Conda environments by location
The approach in the above section is good enough if you're working on your laptop, but probably not the best approach working on the Nevis
Linux cluster:
- It uses up space in your home directory. Even a small list of conda packages takes up a few gigabytes. If everyone in a group does this, the
/home
partition on your login server may be filled up with multiple copies of the exact same environment for each user.
- If you want an environment accessible in a shared directory for all your condor jobs, or to share with multiple users (as we do with
/usr/nevis/conda/root
above), you'll want a different method.
That method is to define an environment by location:
# Find some place in your file-server directory hierarchy with enough space to suit your needs.
# Note that there is no machine named 'olga' at Nevis
cd /nevis/olga/data/jsmith
# Create an appropriate directory if one doesn't already exist
# Note that the name 'myenv' is arbitrary.
mkdir myenv
# Create the conda environment within that directory.
# Again, this list of packages is an example (and unnecessary!)
conda create --prefix ./myenv jupyter python root
From that point forward, you can activate the environment by referring to its location:
conda activate /nevis/olga/data/jsmith/myenv
Copying a conda environment
I'm going to assume a "use case": You've got a conda environment in your home directory, and you now want to move it to location-based environment instead of a name-based one.
If you used
pip3
to install a python package within this conda environment, then it
might be copied over by this procedure. I've tried it a couple of times; once it worked, another time it didn't.
# See the name of any conda environments you've got
conda info --envs
# Create the new location of the environment if it's not already there.
# Again, all these names and locations are examples.
prefix=/nevis/olga/data/jsmith/environments
mkdir -p ${prefix}
# Clone the conda environment (assume your original environment is named 'myenv')
conda create --name myenv --clone ${prefix}/myenv
After copying the environment, you can delete the old one:
conda remove --name myenv --all
Moving the default environment and package directories
By default, conda stores both its environment directory (variable
envs_dirs
) and its package directory (variable
pkgs_dirs
) in your home directory:
~/.conda
. As noted above and in the
HiddenPackages topic, these directories can take up a substantial amount of space as you continue to work with conda environments.
One solution is to move both of these default directories to a partition where you have enough space. For this example, let's assume that you've chosen to store both of these directories within
/nevis/olga/data/jsmith/conda
(again, this is
not a directory that actually exists on any of the Nevis
Linux cluster systems).
# Define a handy variable for this operation
prefix=/nevis/olga/data/jsmith/conda
# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs
# Move the package directory to the new location
# (To move the environments, use the clone recipe above)
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)
# Delete the old package directory
rm -rf ~/.conda/pkgs
Any new conda environments that you create will be in the directory given by
envs_dirs
.
Conda and pip
Both conda and
pip
are package managers. It's important to recognize that they are
different package managers.
It's common for
python
, as a package, to be included in a conda environment. If you're going to use
pip
or
pip3
within a conda environment, then the
python
package
must be included in the conda environment. If you don't do this, the "native" version of pip will be used; in CentOS 7, this will invoke an old version of python, and the outcome will not affect the conda environment.
It's best to stick with conda to install any packages. If a given package (e.g., biopython) is available via conda, it's better to use
conda install biopython
than
pip3 install biopython
. Once you use pip to install a package,
only use pip to update that package or anything that depends on that package.
As a rule of thumb, once you use pip to modify a conda environment, stick with pip from then on.
Note: On AlmaLinux 9 systems, the commands pip
and pip3
are the same and can be used interchangeably. On CentOS 7 systems, pip
and pip3
are not the same; python
and pip
point to Python 2.7; python3
and pip3
point to Python 3.6. In CentOS 7, you almost certainly want to use the commands python3
and pip3
.