Conda
Conda
is a popular package management system. Here are some tips and tricks for using that system on the Nevis particle-physics systems.
Before you go to conda
One thing I've noticed is that users are using conda to install packages that are already available in the Nevis
environment modules. Before you use conda to have access to packages like
numpy
or
matplotlib
, consider an alternative: use this command to load the Python version and libraries that I've installed on our
applications server:
module load python
This will load the latest version of Python (3.10 as of May-2022) from the
Linux cluster software libraries (use
module avail python
to see all the available versions). You can see which packages I've installed as part of our Python distribution with:
pip freeze
Take a look. You might be surprised by what I've included. As of May-2022, the list includes:
wheel jupyter jupyterlab iminuit numpy scipy matplotlib pandas sympy terminado urllib3 tables rootpy rootkernel uproot scikit-learn tensorflow keras torch torchvision scikit-hep h5py astropy gammapy fitsio healpy astropy-healpix cython numba numba_stats
All the packages you see on that list are also available via our
notebook server.
If you want a package that's not on the list, you can install it in your home directory with:
pip install --user --upgrade <package-name>
If a package is of sufficient interest that other users may want it, let
WilliamSeligman know. He'll install it on the both the library server and the notebook server.
This approach has some advantages:
- It saves disk space. If every user starts installing entire software suites in their home directories, there's less space available for other work.
- The library server directory
/usr/nevis
is available to every system in the Linux cluster, including the batch nodes. It might simplify the task of making necessary software packages available for your condor jobs.
Conda and
module load
can be mixed, but it's tricky. If you try to use both at once, carefully consider what each will do to your
$PATH
and
$LD_LIBRARY_PATH
environment variables.
Setting up conda
If you need a particular version of a package, or the software you need is not a Python-based package, then it makes sense to use conda.
You don't have to install conda (or miniconda or anaconda). Conda version 4.6.14 is already installed on every system in the Nevis particle-physics Linux cluster.
The default channel you'll probably want to assign for typical scientific research packages is
conda-forge
:
conda config --add channels conda-forge
conda config --set channel_priority strict
Your shell’s prompt will be changed by conda. Even when you’re not using conda, the text
(base)
will appear at the beginning of the prompt. If this doesn’t bother you, then ignore it. If it does, you can try:
conda config --set auto_activate_base false
You’ll have to log off then log in again to see the change. If you don’t want conda to alter your prompt even when you’re using an environment, this command will suppress conda’s prompt changes:
conda config --set changeps1 False
Conda environments by name
I strongly advise you to read the Conda documentation on
managing environments
. What follows are excerpts from that page that are the most relevant to our work at Nevis.
You can create a conda environment in your home directory with a command like this (note that the name
jupyter-pyroot
in the following examples is arbitrary):
conda create --name jupyter-pyroot jupyter python root
and add packages to it later; e.g.,:
conda install --name jupyter-pyroot jupyterlab numpy scipy matplotlib
Note: The conda example packages above are already available if you use module load root
. All the above-named packages are there, since module load root
also loads a corresponding version of Python.
To activate an environment and make its packages available to you:
conda activate --name jupyter-pyroot
Conda environments by location
The approach in the above section is good enough if you're working on your laptop, but probably not the best approach working on the Nevis cluster:
- It uses up space in your home directory. Even a small list of conda packages takes up a few gigabytes. If everyone in a group does this, the
/home
partition on your login server may be filled up with multiple copies of the exact same environment for each user.
- If you want an environment accessible in a shared directory for all your condor jobs, or to share with multiple users, you'll want a different method.
That method is to define an environment by location:
# Find some place in your file-server directory hierarchy with enough space to suit your needs.
# Note that there is no machine named 'olga' at Nevis
cd /nevis/olga/data/
# Create an appropriate directory if one doesn't already exist
# Note that the name 'myenv' is arbitrary.
mkdir myenv
# Create the conda environment within that directory.
# Again, this list of packages is an example (and unnecessary!)
conda create --prefix ./myenv jupyter python root
From that point forward, you can activate the environment by including the directory:
conda activate /nevis/olga/data/myenv
Copying a conda environment
I'm going to assume a "use case": You've got a conda environment in your home directory, and you now want to move it to location-based environment instead of a name-based one.
If you used
pip
to install a python package within this conda environment, then it
might be copied over by this procedure. I've tried it a couple of times; once it worked, another time it didn't.
# See the name of any conda environments you've got
conda info --envs
# Create the new location of the environment if it's not already there.
# Again, all these names and locations are examples.
mkdir -p /nevis/olga/data/environments
# Clone the conda environment (assume your original environment is named 'myenv')
conda create --name myenv --clone /nevis/olga/data/environments/myenv
After copying the environment, you can delete the old one:
conda remove --name myenv --all
Moving the default environment and package directories
By default, conda stores both its environment directory (variable
envs_dirs
) and its package directory (variable
pkgs_dirs
) in your home directory:
~/.conda
. As noted above, these directories can begin to take up a substantial amount of space as you continue to work with conda environments.
One solution is to move both of these directories to a partition where you have enough space. For this example, let's assume that you've chosen to store both of these directories within
/nevis/olga/data/jsmith/conda
(again, this is
not a directory that actually exists on any of the Nevis
Linux cluster systems).
# Define a handy variable for this operation
prefix=/nevis/olga/data/jsmith/conda
# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs
# Move the package directory to the new location
# (To move the environments, use the clone recipe above)
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)
# Delete the old package directory
rm -rf ~/.conda/pkgs
Any new conda environments that you create will be in the directory given by
envs_dirs
.
Conda and pip
Both conda and
pip
are package managers. It's important to recognize that they are
different package managers.
It's common for
python
, as a package, to be included in a conda environment. If you're going to use
pip
within a conda environment, then the
python
package
must be included in the conda environment. If you don't do this, the native CentOS 7 version of pip will be used, which works with an old version of python, and whose effects will not affect the conda environment.
It's best to stick with conda to install any packages. If a given package (e.g., biopython) is available via conda, it's better to use
conda install biopython
than
pip install biopython
. Once you use pip to install a package,
only use pip to update that package or anything that depends on that package.
As a rule of thumb, once you use pip to modify a conda environment, stick with pip from then on.