Hidden directories and package management

If you use either conda or pip to manage software packages, or work with keras,you may have encountered something strange: Your home directory runs out of disk space, even though your files are small.

The reason is likely hidden directories. By default, in UNIX files and directories whose names begin with a period (.) are not listed when using the ls command. To see these files, use ls -a. For example:

> ls
CreateSubdirectories.C     Untitled.ipynb

> ls -a
.              .condarc                .ipython    .profile            .viminfo
..             .condor                 .jupyter    .root_hist          .vimrc
.astropy       .config                 .local      .python_history     .spamassassin      .Xauthority
.bash_history  CreateSubdirectories.C  .myprofile  .ssh                .zprofile
.bashrc        .npm                    Untitled.ipynb
.cache         .ipynb_checkpoints      .pyzor      .vim

In UNIX, files and directories created for bookkeeping purposes often are hidden, so you can focus on what you've created for your own work. As you can see from the example above, many programs create additional files that you don't usually see. For example, you can use hidden files like .myprofile to configure your shell environment.

For the most part, these hidden files are small. Among the exceptions are the directories used by conda and pip to store the packages you download to create custom software environments.

Here are a couple of examples:

The conda packages are typically stored in ~/.conda. I use the du command (for "disk usage") to see how much disk space is used by the directory:

> cd
> du -shx .conda
4.0G	.conda

The pip user packages are normally stored in ~/.cache, and the python packages in ~/.local:

> cd
> du -shx .cache
4.8G	.cache
> du -shx .local
1.3G	.local

As you can see from the above examples, a substantial fraction of your available disk storage can be used by these utilities, in a place where you may not easily see them.

The rest of this page discusses how to store these directories outside of their default locations, presumably on your group's file servers which have much more available free space. You only have to go through the following steps once; any changes you've made to your configuration files will be preserved between login sessions.

If you don't understand what an example path like /nevis/olga/data means, you may want to review the automount page. Although I suggest storing your package-management directories on a file server, don't forget the warning from the disk guide: files in /data partitions are not backed up. You may want to periodically save the names of the packages you've downloaded so that you can re-create your environments should the worst occur, e.g., with conda list or pip freeze.

Maybe it's already there

Before creating a custom environment, consider looking at the packages already installed on the Nevis Linux cluster. It might be that all the packages you need are already there. There's a list in the conda topic.

Conda file management

This is covered in more detail in the conda wiki page. Here's the short version.

# Define a handy variable for these operations. The variable
# $USER is your account name on the UNIX system. 
# Note that /nevis/olga/data/$USER/conda does not exist at
# Nevis; this is an example that you'll have to edit to suit
# your group's file servers. 
prefix=/nevis/olga/data/$USER/conda

# Create the new package environment directory.
mkdir -p ${prefix}/envs

# If you have an existing environment, clone it to the new location;
# assume your original environment is named 'myenv'.
conda create --name myenv --clone ${prefix}/envs/myenv

# Move the package directory to the new location
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)

# Delete the old environment
conda remove --name myenv --all

# Delete the old package directory
rm -rf ~/.conda/pkgs

# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs

pip and python locations

Note: This section is meant for those who use pip outside of a conda environment. If you're using pip within a conda environment (which I don't recommend if you can avoid it; see the conda topic), then the version of pip within that environment will automatically store files within that environment's directory.

There are two aspects to pip's package management: The location of its cache, and the location in which it installs packages. Both of these directories can grow quite large.

If you're using multiple python versions (this is extremely likely, even if you're not aware of it), you want the path of the package directory to include the python version somehow. In the example below, I'm using fancy UNIX tricks to make that happen.

This command looks cryptic. What it does it detect the version of Python you're running, and set the variable $pvers to 'pythonM.N', where M.N is the major and minor version of Python. For example, if you're using Python 3.9.6, $pvers will be set to 'python3.9`.

pvers=$(echo $(python --version) | sed "s/Python \([0-9]\+\.[0-9]\+\).*/python\1/")

For these examples, assume you want to store pip's cache in /nevis/olga/data/$USER/cache, and have pip store the python libraries in /nevis/olga/data/$USER/${pvers}/site-packages. Note that these names are examples; in particular, there's no machine named olga at Nevis.

To change the default location of pip's directories, you have to modify pip's configuration file. Use variables (to refer to the directories later) and pip config to set new values:

# Define variables for our pip/python directories. 
cacheDir=/nevis/olga/data/$USER/cache
pkgDir=/nevis/olga/data/$USER/${pvers}/site-packages

# Modify pip's configuration
pip config set global.cache-dir ${cacheDir}
pip config set global.target ${pkgDir}

Create these directories. You can copy your existing files if you wish:

mkdir -p ${cacheDir}
mkdir -p ${pkgDir}

# Copy the old cache files to the new locations. This is
# optional. 
(cd ~/.cache/pip; tar -cf -) | (cd ${cacheDir}; tar -xvf -)
# For the libraries, you may have to reinstall them. The following
# line may work, but I make no promises:
( cd $(python -m site --user-site); tar -cf - ) | (cd ${pkgDir}; tar -xvf -)

# Delete the old pip cache directory
rm -rf ~/.cache/pip
# Erase the contents of your python packages directory (see below).
( cd $(python -m site --user-site); rm -rf * )

You've instructed pip to install the python packages in a new location. Now you have tell python to search this location. You can do this by modifying the $PYTHONPATH shell variable, but you have to do this for every login session. What I did was to create a .pth file. To find out where to put the file:

python -m site --user-site

To create the file, I got fancy with shell scripting (the command in a $() expression is executed and its output is put into the command line):

pthFile=$(python -m site --user-site)/my-packages.pth

Note that you can use any name of the form name.pth for the path-name file. In my-packages.pth, you would put in the location of your relocated python library; here that's what's in the variable ${pkgDir}. I'm using another UNIX shell trick to add the name of the new package directory to the end of my .pth file:

echo ${pkgDir} | cat >> ${pthFile}

Again, I'm aware that all of the above is cryptic to anyone unfamiliar with UNIX shells. Assuming that you cut-and-paste the above commands, and edit the name olga to something suitable for your servers, you will hopefully end up with a consistent configuration for current future pip install commands. You can check with:

pip config list

keras downloads

There is one other hidden directory that I've seen that can grow unexpectedly large: ~/.keras. It's used to download pre-trained models. Here's an example:

> cd
> du -shx .keras
2.6G    .keras 

At the time of this writing (Jul-2023), there is only one way to re-direct keras downloads to a different directory. That is to set the shell variable $KERAS_HOME to a different value before you start a program that invokes keras. For example:

export KERAS_HOME="/nevis/olga/data/$USER/keras"
# Then start jupyter or run python or whatever.

You can put the above statement (suitably edited, since the machine olga does not exist at Nevis) into your ~/.myprofile file so you can be assured that the variable will be set at every login session.

Is that all?

No. This page focuses on unexpected large hidden directories that have occurred on the Nevis particle-physic Linux cluster. Depending on your work and programs you use, there may be other "hidden" directories that store files that you did not expect, such as in the Library folder on MacOS. If you use python virtualenv in a project, you may want to look at the directories venv or .venv in the project's directory, or ~/.virtualenvs in your home directory.

Treat this page as a guide to finding other such directories: ls -a to find hidden files/directories, du to see how much space they use.

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2023-11-16 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback